【发布时间】:2021-09-03 16:16:25
【问题描述】:
我必须对一组匹配的文档执行$group 操作。所以我先使用$match,然后使用$group。但是我应该在哪里放置$project?应该在$match之前还是$match之后?
$match 减少了传输到管道中下一个操作的文档数量(大部分)。但是$project 减少了级联到管道中下一个操作的数据量,但不会减少文档数量。
那么我应该更喜欢哪个顺序呢?
样本数据:
{
"_id" : ObjectId("61325594fac485c58bb97fd3"),
"date" : NumberLong(1465776000000),
"account_id" : 794876.0,
"amount" : 8797.0,
"transaction_code" : "buy",
"symbol" : "nvda",
"price" : "46.53873172406391489630550495348870754241943359375",
"total" : "409401.2229765902593427995271"
}
{
"_id" : ObjectId("61325594fac485c58bb97fd2"),
"date" : NumberLong(1325030400000),
"account_id" : 794875.0,
"amount" : 1197.0,
"transaction_code" : "buy",
"symbol" : "nvda",
"price" : "12.7330024299341033611199236474931240081787109375",
"total" : "15241.40390863112172326054861"
}
{
"_id" : ObjectId("61325594fac485c58bb97fd6"),
"date" : NumberLong(1022112000000),
"account_id" : 794876.0,
"amount" : 4521.0,
"transaction_code" : "buy",
"symbol" : "nvda",
"price" : "10.763069758141103449133879621513187885284423828125",
"total" : "48659.83837655592869353426977"
}
{
"_id" : ObjectId("61325594fac485c58bb97fd5"),
"date" : NumberLong(1101081600000),
"account_id" : 794875.0,
"amount" : 253.0,
"transaction_code" : "buy",
"symbol" : "amzn",
"price" : "37.77441226157566944721111212857067584991455078125",
"total" : "9556.926302178644370144411369"
}
{
"_id" : ObjectId("61325594fac485c58bb97fd4"),
"date" : NumberLong(1472601600000),
"account_id" : 794875.0,
"amount" : 6146.0,
"transaction_code" : "sell",
"symbol" : "ebay",
"price" : "32.11600884852845894101847079582512378692626953125",
"total" : "197384.9903830559086514995215"
}
{
"_id" : ObjectId("61325594fac485c58bb97fd7"),
"date" : NumberLong(936144000000),
"account_id" : 794875.0,
"amount" : 955.0,
"transaction_code" : "buy",
"symbol" : "csco",
"price" : "27.992136535152877030441231909207999706268310546875",
"total" : "26732.49039107099756407137647"
}
以下哪个查询更好?或者,以下两者之间有什么明显的区别吗?
db.getCollection('temp123').aggregate([
{
$project: {
account_id: 1,
transaction_code: 1
}
},
{
$match: {
transaction_code: "buy"
}
},
{
$group: {
_id: "$account_id",
count: {
$sum: 1
}
}
}
])
或
db.getCollection('temp123').aggregate([
{
$match: {
transaction_code: "buy"
}
},
{
$project: {
account_id: 1,
transaction_code: 1,
}
},
{
$group: {
_id: "$account_id",
count: {
$sum: 1
}
}
}
])
注意: 这不是实际数据,只是添加以供参考。在实际数据中,平均文档大小在 1 到 5 MB 之间,文档数量在 0.1 到 1000 万之间。
【问题讨论】:
-
您需要发布示例数据和预期输出以及尝试过的代码。我们不是上帝 :D 预测你的想法
-
希望添加的数据足够。
-
我想两个都试一下,记录开始到结束的时间差异,比较哪个更好
-
不需要投影,如果您不需要字段,MongoDB 优化器无论如何都会这样做(除非有原因,否则不要添加项目阶段),仅使用
$match阶段并在字段上创建索引如果可以的话,我想你会没事的。
标签: java mongodb mongodb-query aggregation-framework