【发布时间】:2015-11-11 09:48:34
【问题描述】:
这是我第一次使用 MongoDB 聚合查询。我的数据集如下:
{ // doc 1
"_id" : ObjectId("55f2481bc9b4cd1c0c198c9f"),
"channels" : [
"channel_3",
"channel_2",
"channel_1",
"channel_4"
],
"msd" : 25,
"uid" : "000012bb-2e5a-8bd3-d36a-fa037973e632"
}
{ // doc 2
"_id" : ObjectId("55f2481bc9b4cd123452345f"),
"channels" : [
"channel_3",
"channel_4"
],
"msd" : 50,
"uid" : "000012bb-2e5a-8bd3-d36a-fa037973e632"
}
{ // doc 3
"_id" : ObjectId("55f2481bc9b4cd1c0c198c9f"),
"channels" : [
"channel_2"
],
"msd" : 100,
"uid" : "000012bb-2e5a-8bd3-d36a-fa037973e632"
}
{ // doc 4
"_id" : ObjectId("55f2481bc9b4cd1c0c198c9f"),
"channels" : [
"channel_2"
],
"msd" : 80,
"uid" : "000012bb-2e5a-8bd3-d36a-fa037973e632"
}
我已经建立了一个复合索引:
userlog.create_index([('uid', ASCENDING), ('channels', ASCENDING)])
现在,给定一个用户和一组频道,我想检索至少一个频道在查询频道中的 msd 的平均值。 例如,查询是:
{"uid" : "000012bb-2e5a-8bd3-d36a-fa037973e632", "channels" : ["channel_1", "channel_2"], }
doc 1 的频道包含“channel_1”和“channel_2”,doc 3 和 4 的频道包含“channels_2”。所以期望返回值为 (25+100+80)/3 = 68.33
======================= 试用 1 ==================== ===
代码:
pipe=[
{"$unwind":'$channels'},
{"$match":{'uid':"000012bb-2e5a-8bd3-d36a-fa037973e632", 'channels':{'$in':channels}}},
{"$group":{'_id': '$channels', 'averageMSD':{'$avg':'$msd'}}}
]
for res in db.aggregate(pipeline=pipe):
print(res)
结果:
{'_id': 'channel_1', 'averageMSD': 25.0}
{'_id': 'channel_2', 'averageMSD': 68.33333333333333}
似乎 "$unwind" 使 doc 1 意外计数了两次。另外,"$unwind" 很慢。
====================== 试用 2 ==================== ===
代码:
pipe=[
{"$match":{'uid':"000012bb-2e5a-8bd3-d36a-fa037973e632", 'channels':{'$in':channels}}},
{"$group":{'_id': '$channels', 'averageMSD':{'$avg':'$msd'}}}
]
for res in db.aggregate(pipeline=pipe):
print(res)
结果:
{'averageMSD': 90.0, '_id': ['channel_2']}
{'averageMSD': 25.0, '_id': ['channel_3', 'channel_2', 'channel_1', 'channel_4']}
结果仍然不是我想要的。看来我不应该按“渠道”对结果进行分组。但我不知道如何解决它。
如何使用聚合高效查询数据库?
【问题讨论】:
标签: mongodb aggregation-framework