MongoDB 嵌入式文档：大小限制和聚合性能问题答案

【问题标题】：MongoDB embedded documents: size limit and aggregation performance concernsMongoDB 嵌入式文档：大小限制和聚合性能问题
【发布时间】：2020-01-22 04:03:17
【问题描述】：

在 MongoDB 的文档中，建议将尽可能多的数据放在一个文档中。还建议不要使用基于 ObjectId ref 的子文档，除非这些子文档的数据必须从多个文档中引用。

在我的例子中，我有这样的一对多关系：

日志架构：

const model = (mongoose) => {
    const LogSchema = new mongoose.Schema({
        result: { type: String, required: true },
        operation: { type: Date, required: true },
        x: { type: Number, required: true },
        y: { type: Number, required: true },
        z: { type: Number, required: true }
    });
    const model = mongoose.model("Log", LogSchema);
    return model;
};

机器架构：

const model = (mongoose) => {
    const MachineSchema = new mongoose.Schema({
        model: { type: String, required: true },
        description: { type: String, required: true },
        logs: [ mongoose.model("Log").schema ]
    });
    const model = mongoose.model("Machine", MachineSchema);
    return model;
};
module.exports = model;

每台 Machine 都会有许多 Production_Log 文档（超过一百万）。使用嵌入式文档，我在测试期间很快达到了每个文档 16mb 的限制，我无法再向 Machine 文档添加任何 Production_Log 文档。

我的疑惑

在这种情况下，是否应该使用子文档作为 ObjectId 引用而不是嵌入文档？
还有其他我可以评估的解决方案吗？
我将访问 Production_Log 文档以使用聚合框架为每个 Machine 生成统计信息。我应该对架构设计有任何额外的考虑吗？

非常感谢您的建议！

【问题讨论】：

标签： javascript node.js mongodb mongoose mongoose-schema

【解决方案1】：

Database normalization 不适用于 MongoDB

如果将完整信息存储在单个文档中（数据冗余），MongoDB 的扩展性会更好。数据库规范化强制将数据拆分到不同的集合中，但是一旦数据增长，就会导致瓶颈问题。

仅使用 LOG 架构：

const model = (mongoose) => {
    const LogSchema = new mongoose.Schema({
        model: { type: String, required: true },
        description: { type: String, required: true },
        result: { type: String, required: true },
        operation: { type: Date, required: true },
        x: { type: Number, required: true },
        y: { type: Number, required: true },
        z: { type: Number, required: true }
    });
    const model = mongoose.model("Log", LogSchema);
    return model;
};

读/写操作以这种方式可以很好地扩展。

使用Aggregation，您可以处理数据并计算所需的结果。

【讨论】：

【解决方案2】：

请看看这种方法是否适合您的需要

Log 集合会生成更多数据，而Machine 集合永远不会超过 16MB。不要将Log 集合嵌入Machine 集合，反之亦然。

您修改后的架构将是这样的

机器架构：

const model = (mongoose) => {
    const MachineSchema = new mongoose.Schema({
        model: { type: String, required: true },
        description: { type: String, required: true }        
    });
    const model = mongoose.model("Machine", MachineSchema);
    return model;
};
module.exports = model;

日志架构：

const model = (mongoose) => {
    const LogSchema = new mongoose.Schema({
        result: { type: String, required: true },
        operation: { type: Date, required: true },
        x: { type: Number, required: true },
        y: { type: Number, required: true },
        z: { type: Number, required: true },
        machine: [ mongoose.model("Machine").schema ]
    });
    const model = mongoose.model("Log", LogSchema);
    return model;
};

如果我们仍然超出 Document (16MB) 的大小，那么在日志架构中，我们可以根据我们生成的日志为每天/每小时/每周创建一个新文档。

【讨论】：

嗨克莱门特，感谢您的回复。在 MachineSchema 中有一个 LogSchema 引用数组不是更好吗？这种关系是一台机器对多台日志。或者也许在 LogSchema 中有一个 MachineSchema 引用...
更好的方法是拥有一个没有关系的架构。我们正在讨论的设计不是最适合使用 NoSQL DB，这些关系基于 RDBMS 世界，其中规范化和无重复是 RDBMS DB 的主要目标。 NoSQL DB 旨在处理大量数据，而它允许复制数据并设计支持我们输出的模式。在这里，我们可以尝试在每个日志条目中保留一份机器详细信息的副本（每天/每周/每月创建 - 以不超过我们的大小为准）在检索记录时表现更好
NoSQL DB 旨在处理海量数据，在这种情况下，如果我们像在 RDBMS 中那样寻求更多的关系，我们的记录检索将受到性能的真正影响，这不会给我们带来良好的用户体验。最好的选择是使用 Valijon 提供的解决方案中给出的单一模式