Mongodb查询执行需要很长时间答案

【问题标题】：Mongodb query execution takes long timeMongodb查询执行需要很长时间
【发布时间】：2015-11-23 06:21:06
【问题描述】：

下面是我的 mongodb 3.0 查询，它的执行时间很长（4 秒以上），数据集只有 430 万个文档：

db.getCollection('TestingCollection').aggregate([ 
    { $match: { 
        myDate: { $gte: new Date(949384052490) }, 
        $and: [ 
            { 
                myDate: { $lte: new Date(1448257684431) }, 
                $and: [ { myId: 10 } ] 
            }
        ], 
        type: { $ne: "Contractor" } 
    }}, 
    { $project: { 
        retailerName: 1,
        unitSold: 1, 
        year: { $year: [ "$myDate" ] },
        currency: 1, 
        totalSales: { $multiply: [ "$unitSold", "$itemPrice" ] } 
    }}, 
    { $group: { 
        _id: { 
            retailerName: "$retailerName", 
            year: "$year",      
            currency: "$currency" 
        }, 
        netSales: { $sum: "$revenue" }, 
        netUnitSold: { $sum: "$unitSold" }, 
        totalSales: { $sum:"$totalSales" } 
    }}
] )

复合索引字段：

(myDate : 1, retailerName:1, type:1, myId:1).

与

相同的查询

type: { $eq: "Contractor" }

执行需要几毫秒。

请告诉我哪里做错了。

【问题讨论】：

标签： mongodb mongodb-query aggregation-framework spring-mongo

【解决方案1】：

“范围选择”指定错误，您对$and 的使用不正确。实际上只考虑了“最后一个”参数，因此它只是在寻找所有“大于myId 等于10 的日期”，这当然是不正确的。 p>

这是您的$match 的正确查询语法：

{ "$match": { 
    "myDate": { 
        "$gte": new Date(949384052490),
        "$lte": new Date(1448257684431)
    },
    "myId": 10,
    "type": { "$ne": "Contractor" }
}}

不需要任何$and，因为所有 MongoDB 查询参数都已经是 AND 条件。

您还应该考虑组合$project 和$group 阶段，因为这通常意味着当它们一个接一个发生时可以组合它们。至少这样更有效。

当然，大部分时间都浪费在最初的$match 上，无论如何都会选择不正确的结果。

$group 和没有 $project 的最佳管道：

{ "$group": { 
    "_id": { 
        "retailerName": "$retailerName", 
        "year": { "$year": "$myDate" },      
        "currency": "$currency"
    }, 
    "netSales": { "$sum": "$revenue" }, 
    "netUnitSold": { "$sum": "$unitSold" }, 
    "totalSales": { "$sum": 
        { "$multiply": [ "$unitSold", "$itemPrice" ] }
    }
}}

所以整个管道现在只是$match 然后$group。

使用 spring mongo

如果您使用的是 spring-mongo，那么当前受支持的运算符与复合键和累加器中的计算值的组合 $group 存在限制，但您可以解决这些问题。至于$and 语句，这确实是语法问题，而不是spring mongo的错。

首先为聚合管道中的“组”设置一个自定义类：

public class CustomGroupOperation implements AggregationOperation {
    private DBObject operation;

    public CustomGroupOperation (DBObject operation) {
        this.operation = operation;
    }

    @Override
    public DBObject toDBObject(AggregationOperationContext context) {
        return context.getMappedObject(operation);
    }
}

然后使用该类构建管道：

    Aggregation aggregation = newAggregation(
        match(
                Criteria.where("myDate")
                        .gte(new Date(new Long("949384052490")))
                        .lte(new Date(new Long("1448257684431")))
                        .and("myId").is(10)
                        .and("type").ne("Contractor")
        ),
        new CustomGroupOperation(
            new BasicDBObject(
                "$group", new BasicDBObject(
                    "_id", new BasicDBObject(
                        "retailerName", "$retailerName"
                    ).append(
                        "year", new BasicDBObject("$year", "$myDate")
                    ).append(
                        "currency", "$currency"
                    )
                ).append(
                    "netSales", new BasicDBObject("$sum","$revenue")
                ).append(
                    "netUnitSold", new BasicDBObject("$sum","$unitSold")
                ).append(
                    "totalSales", new BasicDBObject(
                        "$multiply", Arrays.asList("$unitSold", "$itemPrice")
                    )
                )
            )
        )
    );

这会产生这样的序列化管道：

[ 
    { "$match" : { 
        "myDate" : { 
            "$gte" : { "$date" : "2000-02-01T05:47:32.490Z"}, 
            "$lte" : { "$date" : "2015-11-23T05:48:04.431Z"}
        }, 
        "myId" : 10, 
        "type" : { "$ne" : "Contractor"}
    }}, 
    { "$group": { 
        "_id" : { 
            "retailerName" : "$retailerName", 
            "year" : { "$year" : "$myDate"}, 
            "currency" : "$currency"
        }, 
        "netSales" : { "$sum" : "$revenue"}, 
        "netUnitSold" : { "$sum" : "$unitSold"}, 
        "totalSales" : { "$multiply" : [ "$unitSold" , "$itemPrice"]}
    }}
]

和上面给出的例子完全一样

【讨论】：

感谢 blakes，即使您的查询执行时间超过 4 秒，查询执行性能也没有任何改善。请帮忙。克里斯
@chiku 重点是您的查询选择完全不正确，我还添加了使用 spring mongo 实现的详细信息，显然您也有错误。至于时间，您的主要因素将是数据和索引的大小。日期范围是“十五年”，这是相当多的。你应该确保你的索引被定义为尽可能减少选择的结果。
谢谢布雷克斯，会的。根据我使用 spring-data-mongodb 提供的 api 的经验，它更好，因为它减少了大量的管道代码。请分享你的经验。克里斯
@chiku 我的经验是自定义管道阶段是必要的，因为此时无法使用 spring-mongo 助手进行构建。将$project 和$group 分开会导致性能大幅下降。但是自定义阶段的重点在于它与提供的辅助方法混合在一起。如图所示。
非常感谢 blakes，但您的查询也需要相同的时间来执行 4.2M 文档。克里斯