【问题标题】:Why are mongoDB counts incorrect?为什么 mongoDB 计数不正确?
【发布时间】:2013-05-08 02:35:03
【问题描述】:

我的数据如下所示:

 {
       "_id":ObjectId("516fbf68067323ce2ea5b4b8"),
       "title":"GVPKFlFIXdLUaLM",
       "release_year":1913,
       "country_of_origin":"sWdXLXUfun",
       "length_in_minutes":147,
       "plot_summary":"bmwYkyyiSymHJYoXEPauPNjdKoFANDgcDImVelDGPuPJmLhyWOuNXjurNyGp",
       "director":"rNDFhhxGIo",
       "language":"oYeWskT",
       "popularity":5.2,
       "genre":"jDwdaMhuT",
       "actors":[
          {
             "id":2740,
             "name":"actor2740",
             "dob":1989,
             "alt_name":"PBpXPqJwmftpfcR",
             "pob":"DFoxETDuhAdDGNE"
          },
          {
             "id":3143,
             "name":"actor3143",
             "dob":1953,
             "alt_name":"AHnVvTviSKuvNZO",
             "pob":"KBUdvbnvNkXmddk"
          }
       ]
    }

起初我以为 Mongo 有一个错误。我尝试使用聚合函数来解决假设的业务问题。 (编辑:我并不是说我解决了一个 mongo 问题,或者我希望人们帮助我创建一个算法,只是为了确认 MongoDB 的潜在错误)

db.movies.aggregate([{$match:{popularity:{$gte:7.3}}},
     {$project:{actors:1,popularity:1}},
     {$unwind:"$actors"},
     {$group:{_id:"$actors.id",avgPop:{$avg:"$popularity"},
              docsByTag : { $sum : 1 }, popSum:{$sum:"$popularity"}}},
    {$match:{avgPop:{$gte:7.5}}}]);

我关注的结果(编辑 $sum:1 而不是 0)

{
            "_id" : 1383,
            "avgPop" : 8.772857142857141,
            "docsByTag" : 28,
            "popSum" : 245.63999999999996
        },

但是当我手动验证结果时。

db.movies.find({"actors.name":"actor1383"},{title:1,popularity:1,_id: 0})

{ "title" : "kZFfBwtAfVNobEq", "popularity" : 8.54 }
{ "title" : "kyOeSorYUWyJmjK", "popularity" : 8.11 }
{ "title" : "rvSdJCgEkkpYgFB", "popularity" : 8.36 }
{ "title" : "SwcgHTgZqqcYJja", "popularity" : 8.68 }
{ "title" : "XmcidmdwtDlNoKw", "popularity" : 7.33 }
{ "title" : "gwThvrWifoKCvyG", "popularity" : 7.94 }
{ "title" : "RdUsAFIxTnntTZR", "popularity" : 6.91 }
{ "title" : "RwhJlORFdvtDtpO", "popularity" : 5.13 }
{ "title" : "TuDfcWhNkQFeycl", "popularity" : 9.93 }
{ "title" : "xTVkwnyvftKQraC", "popularity" : 7.27 }
{ "title" : "HYMjUFlSXgnWVTx", "popularity" : 6.94 }
{ "title" : "ZPPyAUdGMeVQhbK", "popularity" : 8.48 }
{ "title" : "kEITAiMMrWTECGM", "popularity" : 9.42 }
{ "title" : "asNsLYKjvHlihXZ", "popularity" : 9.86 }
{ "title" : "ctEmciXPhbMtspt", "popularity" : 8.85 }
{ "title" : "DHjFtctccwDHtlf", "popularity" : 5.5 }
{ "title" : "ElUqbLqkoKrJPVl", "popularity" : 8.26 }
{ "title" : "XdTCieKsWtTbfZa", "popularity" : 5.72 }
{ "title" : "EeNqOPSuKiHuWRs", "popularity" : 5.91 }
{ "title" : "YgysqxcesvPryMY", "popularity" : 6.05 }
{ "title" : "eARvpGydsWilquc", "popularity" : 7.34 }
{ "title" : "NDpdkhSUfePDYjH", "popularity" : 7.28 }
{ "title" : "wUGKLBwijftQKgU", "popularity" : 8.97 }
{ "title" : "UHVGUmAcjBgAPBp", "popularity" : 7.44 }
{ "title" : "NKTKEKfbxFrudVi", "popularity" : 9.4 }
{ "title" : "AeByTKwsEQuQBYG", "popularity" : 8.97 }
{ "title" : "nZskARfGbhYRxdY", "popularity" : 9.16 }
{ "title" : "nBenZrikXFFrrnq", "popularity" : 7.58 }
{ "title" : "GdEFwoKgqjhHvjM", "popularity" : 6.3 }
{ "title" : "grpKTHgnYcDNyXH", "popularity" : 7.16 }
{ "title" : "hXhOqknvjIYJIaT", "popularity" : 5.24 }
{ "title" : "rggTJENnVeuqQVI", "popularity" : 9.95 }
{ "title" : "ABvGVFHkgOumMPO", "popularity" : 9.56 }
{ "title" : "baVkepHniIURUFH", "popularity" : 9.28 }
{ "title" : "PUYXlhPwbanMDmT", "popularity" : 9.6 }
{ "title" : "IJbqonvsVeorDMv", "popularity" : 7.82 }
{ "title" : "iAhyATKYpCVjtMw", "popularity" : 5.88 }
{ "title" : "uDECLFQGTOVnyvC", "popularity" : 6.25 }
{ "title" : "rTwfCYLfLwgPcbH", "popularity" : 8.38 }
{ "title" : "GRyKjecBHQhvYJk", "popularity" : 9.11 }
{ "title" : "GyEaSHoprUvGmZM", "popularity" : 9.92 } 

给出大于或等于 7.3 的 27 个元素的子集

{ "title" : "kZFfBwtAfVNobEq", "popularity" : 8.54 }
{ "title" : "kyOeSorYUWyJmjK", "popularity" : 8.11 }
{ "title" : "rvSdJCgEkkpYgFB", "popularity" : 8.36 }
{ "title" : "SwcgHTgZqqcYJja", "popularity" : 8.68 }
{ "title" : "XmcidmdwtDlNoKw", "popularity" : 7.33 }
{ "title" : "gwThvrWifoKCvyG", "popularity" : 7.94 }
{ "title" : "TuDfcWhNkQFeycl", "popularity" : 9.93 }
{ "title" : "ZPPyAUdGMeVQhbK", "popularity" : 8.48 }
{ "title" : "kEITAiMMrWTECGM", "popularity" : 9.42 }
{ "title" : "asNsLYKjvHlihXZ", "popularity" : 9.86 }
{ "title" : "ctEmciXPhbMtspt", "popularity" : 8.85 }
{ "title" : "ElUqbLqkoKrJPVl", "popularity" : 8.26 }
{ "title" : "eARvpGydsWilquc", "popularity" : 7.34 }
{ "title" : "wUGKLBwijftQKgU", "popularity" : 8.97 }
{ "title" : "UHVGUmAcjBgAPBp", "popularity" : 7.44 }
{ "title" : "NKTKEKfbxFrudVi", "popularity" : 9.4 }
{ "title" : "AeByTKwsEQuQBYG", "popularity" : 8.97 }
{ "title" : "nZskARfGbhYRxdY", "popularity" : 9.16 }
{ "title" : "nBenZrikXFFrrnq", "popularity" : 7.58 }
{ "title" : "rggTJENnVeuqQVI", "popularity" : 9.95 }
{ "title" : "ABvGVFHkgOumMPO", "popularity" : 9.56 }
{ "title" : "baVkepHniIURUFH", "popularity" : 9.28 }
{ "title" : "PUYXlhPwbanMDmT", "popularity" : 9.6 }
{ "title" : "IJbqonvsVeorDMv", "popularity" : 7.82 }
{ "title" : "rTwfCYLfLwgPcbH", "popularity" : 8.38 }
{ "title" : "GRyKjecBHQhvYJk", "popularity" : 9.11 }
{ "title" : "GyEaSHoprUvGmZM", "popularity" : 9.92 }

比聚合函数少一。

所以我认为可能是聚合被破坏并将其重写为 mapReduce

// make sure we're using the right db; this is the same as "use aggdb;" in shell
db = db.getSiblingDB("recommendations"); //Put your MongoLab database name here.



var mapFunc2 = function() {
                       for (var idx = 0; idx < this.actors.length; idx++) {
                           var key = this.actors[idx].id;
                           var value = {
                                         count: 1,
                                         pop: this.popularity
                                       };
                           emit(key, value);
                       }
                    };

var reduceFunc2 = function(keyActor, countObjVals) {


                     reducedVal = { actor: keyActor, count: 0, pop: 0, pop_list : [] };

                     for (var idx = 0; idx < countObjVals.length; idx++) {
                         reducedVal.count += countObjVals[idx].count;
                         reducedVal.pop += countObjVals[idx].pop;
                         reducedVal.pop_list = reducedVal.pop_list.concat(countObjVals[idx].pop);
                     }

                     return reducedVal;
                  };

var finalizeFunc2 = function (key, reducedVal) {

                       reducedVal.avg = reducedVal.pop/reducedVal.count;

                       return reducedVal;

                    };


result = db.movies.mapReduce( mapFunc2,
                     reduceFunc2,
                     {
                       out: { merge: "mre" },
                       query: { popularity:
                                  { $gte: 7.3 }
                              },
                       finalize: finalizeFunc2
                     }
                   )
cursor = db.map_reduce_example.find()                  

while(cursor.hasNext()){
    printjson(cursor.next());

}

结果又差一分

{
    "_id" : 1383,
    "value" : {
        "actor" : 1383,
        "count" : 28,
        "pop" : 245.63999999999996,
        "avg" : 8.772857142857141
    }
}

所以我开始调试,当谈到保存数组中每部电影的受欢迎程度时,我看到了一些奇怪的事情。

{ "_id" : 1, "value" : { "actor" : 1, "count" : 13, "pop" : 114.97, "pop_list" : [ 7.47, 8.52, 9.95, 17.4, 7.4, 19.43, 8.46, 17.21, 9.24, 9.89 ], "平均" : 8.843846153846155 } }

这里,奇怪的是计数是13,但元素个数是10。这是因为

7.4 7.4
7.47    7.47
8.07    1
8.14    2
8.46    8.46
8.52    8.52
9.14    1
9.24    9.24
9.26    2
9.57    3
9.86    3
9.89    9.89
9.95    9.95

其中1,2,3对应

1   17.21=9.14+8.07
2   17.4=8.14+9.26
3   19.43=9.57+9.86

{ "_id" : 2, "value" : { "actor" : 2, "count" : 14, "pop" : 120.91999999999999, "pop_list" : [ 35.239999999999995, 7.58, 35.56, 9.35, 25.83999999999996] ,“平均”:8.637142857142857 } } 但是,以上内容完全是神秘的,因为我所有的平均值只有 2 位小数精度。

在这一点上真的很困惑。我相信这篇文章可能会对其他遇到相同类型计数问题的人有所帮助。

【问题讨论】:

  • 你要数什么?你开始谈论解决方案而不说明你的问题。
  • 您的聚合中是否存在剪切粘贴错误?您显示 {$sum:0} 并给出 0。您使用的是最新版本吗?
  • Asya,谢谢你捕捉到 {$sum:0} 它是 def。不是我想要输入的。我正在向朋友展示聚合结果,但不小心将 1 替换为 0。

标签: mongodb aggregation-framework


【解决方案1】:

聚合框架和 mapreduce 都犯“错误”的可能性很小,所以我想请您验证您如何将它们的结果与您的预期进行比较。

在您的聚合中,您对 "actors.id" 字段进行分组。但是您手动验证的查询是:

db.movies.find({"actors.name":"actor1383"},{title:1,popularity:1,_id: 0})

是否有证据表明您的“actors.name”和“actors.id”字段匹配 100%?

浮点运算的精度高于2位是正常的,不用担心。这与要求 5 和 10 的平均值并得到 7.5 没有什么不同,即使 5 和 10 都没有小数点后的数字。

“差异”可能来自另一个地方。如果您有这样的文件:

{ 人气:7.6, 演员:[ {编号:1383, ... ... }, {编号:1383, ... ... } }

您现在将只有一个由此产生的顶级文档,但是当您展开演员数组时,您现在有两个由此产生的文档,其中两个文档都有 actor.id 1383。您能否验证每个演员只出现一次每个顶级文件?如果不是,那将导致您看到的差异。

【讨论】:

  • 阿莎,谢谢你的回答。让我们先关注一下聚合。 agg({$match:{popularity:{$gte:7.3}}}) 应该排除所有 pop 低于 7.3 的电影记录 Actor 1383 有 41 部电影: > db.runCommand({count: 'movies', query :{'actors.id':1383,popularity:{$gte:0}}}) { "n" : 41, "ok" : 1 } 查看我的原始帖子了解详细输出,现在 > db.runCommand({count : 'movies', query:{'actors.id':1383,popularity:{$gte:7.3}}}) { "n" : 27, "ok" : 1 } 这怎么可能?聚合和 mapReduce 的计数都是 28。这对我来说毫无意义。
  • Asya,为了强调为什么我认为这是一个 Mongo 错误,我重新运行了查询 db.movi​​es.aggregate([{$project:{actors:1,popularity:1}},{$unwind :"$actors"},{$group:{_id:"$actors.id",avgPop:{$a​​vg:"$popularity"},docsByTag : { $sum : 1 }, popSum:{$sum:"$人气"}}},{$match:{avgPop:{$gte:0}}}]);我在项目完成之前完成了比赛。但是你看到问题仍然存在。计数应该是 41,但实际上是 42。并且由于精度,popSum 看起来非常错误。 { “_id”:1383,“avgPop”:7.932857142857144,“docsByTag”:42,“popSum”:333.18000000000006 }
  • 我根据重新阅读您编辑的问题更改了答案 - 我认为当您将其与手动查询进行比较时,您主要担心计数不匹配,但您的查询与您使用的 aggr 字段不同。跨度>
  • 所有数据都是我使用java加载器生成的。无论如何,这里又来了(见我的第一个回应)> db.runCommand({count: 'movies', query:{'actors.id':1383,popularity:{$gte:0}}}) { " n" : 41, "ok" : 1 } 不是 42 !!我只是不知道。它真的看起来像 Mongo 错误。我可以转储我的数据库。我尝试调试,即在最终结果中打印所有“pop”值但是在我上面的问题中看到为什么它没有工作,虽然它应该有根据cookbook.mongodb.org/patterns/pivot
  • Asya,我可以将此作为错误提交给 10 gen 吗?
猜你喜欢
  • 1970-01-01
  • 2018-08-31
  • 2013-12-12
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 2021-07-21
  • 1970-01-01
  • 2019-11-24
相关资源
最近更新 更多