【问题标题】:Map Reduce with mongo on nested document在嵌套文档上使用 mongo 映射 Reduce
【发布时间】:2014-06-06 06:30:20
【问题描述】:

我的文档结构如下:

{
  "country_id" : 328,
  "country_name" : "Australien",
  "cities" : [{
      "city_id" : 19398,
      "city_name" : "Bondi Beach (Sydney)"
    }, {
      "city_id" : 31102,
      "city_name" : "Double Bay (Sydney)"
    }, {
      "city_id" : 31101,
      "city_name" : "Rushcutters Bay (Sydney)"
    }, {
      "city_id" : 817,
      "city_name" : "Sydney"
    }, {
      "city_id" : 31022,
      "city_name" : "Wolly Creek (Sydney)"
    }, {
      "city_id" : 18851,
      "city_name" : "Woollahra"
    }],
  "regions" : {
    "region_id" : 796,
    "region_name" : "Australien: New South Wales (Sydney)"
  }
}

对于多面导航,我想计算属性 country_id、city.city_id、regions_region_id 我想我可以使用 map /reduce 来做到这一点。

对于给定的结构,这可能吗?

也许有人可以指出正确的地图/减少方向。

【问题讨论】:

  • 为什么要使用map/reduce?使用聚合框架,这会更直接(也更快)。
  • 我认为聚合框架不可能做到这一点。你有上面数据结构的例子吗?

标签: mongodb mapreduce


【解决方案1】:

Mongo map-reduce 示例可以在这里找到:http://docs.mongodb.org/manual/tutorial/map-reduce-examples/

每个唯一的 country_id、city_id 和 region_id 元组的文档数量很简单:

> function m() { 
    for(var i in this.cities) {     
         emit({country_id:this.country_id, 
               city_id:this.cities[i].city_id,
               region_id:this.regions.region_id}, 
              1); 
    } }



> function r(id,docs) {
      return Array.sum(docs);
}
> db.loc.mapReduce(m,r,{out:"map_reduce_out"})
{
    "result" : "map_reduce_out",
    "timeMillis" : 5,
    "counts" : {
        "input" : 1,
        "emit" : 6,
        "reduce" : 0,
        "output" : 6
    },
    "ok" : 1,
}
> db.map_reduce_out.find()
{ "_id" : { "country_id" : 328, "city_id" : 817, "region_id" : 796 }, "value" : 1 }
{ "_id" : { "country_id" : 328, "city_id" : 18851, "region_id" : 796 }, "value" : 1 }
{ "_id" : { "country_id" : 328, "city_id" : 19398, "region_id" : 796 }, "value" : 1 }
{ "_id" : { "country_id" : 328, "city_id" : 31022, "region_id" : 796 }, "value" : 1 }
{ "_id" : { "country_id" : 328, "city_id" : 31101, "region_id" : 796 }, "value" : 1 }
{ "_id" : { "country_id" : 328, "city_id" : 31102, "region_id" : 796 }, "value" : 1 }

【讨论】:

    【解决方案2】:

    似乎区域应该是一个数组

     "regions" : [{
        "region_id" : 796,
        "region_name" : "Australien: New South Wales (Sydney)"
      }]
    

    "我要统计属性 country_id, ..."

    看来你想要这个输出。

    ...
    {_id:  328, cities: 6, regions: 1},
    {_id:  329, cities: 10, regions: 4},
    ...
    

    尝试以下方法,注意它只会对城市数组求和。

    db.Country.aggregate(
      { $unwind : "$regions" },  {'$group': {'_id': '$country_id' , 'cities' : { $sum : 1}   } }
    )
    

    以下将提供与接受的答案类似的输出。

    db.Country.aggregate(
    {'$group': {'_id': '$country_id' , 'cities' : { $push: "$cities.city_id" }, 'regions' : {    $push: "$regions.region_id" }   }  }
    )
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 2015-02-03
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2021-02-04
      • 2013-11-24
      • 1970-01-01
      相关资源
      最近更新 更多