【发布时间】:2014-02-10 15:50:47
【问题描述】:
我是mongodb新手!我正在尝试处理一些高音扬声器数据。我的目标是在每个时间间隔(为简单起见,每天间隔)对用户进行分组,并在那天计算他唯一的主题标签。我的想法是构建只包含用户、日期和主题标签的新数据库。这是数据格式:
> db.sampledDB.findOne()
{
"_id" : NumberLong("2334234"),
"replyid" : NumberLong(-1),
"userid" : NumberLong(21313),
"replyuserid" : NumberLong(-1),
"createdAt" : ISODate("2013-07-02T22:35:06Z"),
"tweettext" : "RT @BBCBreaking: Plane carrying Bolivia President Morales is diverted to Austria on suspicion US fugitive #Snowden is on board - Bolivian m…",
"screenName" : "x83",
"name" : "david x",
"retweetCount" : NumberLong(0),
"retweet_id" : NumberLong("12313223"),
"retweet_userid" : NumberLong(123123123),
"source" : "<a href=\"http://www.twitter.com\" rel=\"nofollow\">Twitter for Windows Phone</a>",
"hashtags" : [
{
"start" : 106,
"end" : 114,
"text" : "Snowden"
}
],
"mentions" : [
{
"start" : 3,
"end" : 15,
"id" : NumberLong(876678),
"screenName" : "BBCBreaking",
"name" : "BBC Breaking News"
}
],
"media" : [ ]
}
我像这样使用 mapReduce: 地图:
map = function(){
//format date to year/month/day
var format = this.createdAt.getFullYear() + '/' + (this.createdAt.getMonth()+1) + '/' + this.createdAt.getDate();
var key = {userid:this.userid, date:format};
emit(key,{hashtags:this.hashtags}); }
减少:
reduce = function(key,values){
var result = {a:[]};
for (var idx=0;idx<values.length;idx++){
result.a.push(values[idx].hashtag);
}
return result};
结果是:
{
"_id" : {
"userid" : NumberLong(7686787),
"date" : "2013/7/5"
},
"value" : {
"hashtag" : [
{
"start" : 24,
"end" : 44,
"text" : "SıkSöylenenYalanlar"
},
{
"start" : 45,
"end" : 60,
"text" : "ZimmermanTrial"
},
{
"start" : 61,
"end" : 84,
"text" : "ZaynMalikYouArePerfect"
},
{
"start" : 85,
"end" : 99,
"text" : "TrayvonMartin"
},
{
"start" : 100,
"end" : 110,
"text" : "Wimbledon"
},
{
"start" : 111,
"end" : 118,
"text" : "Футбол"
},
{
"start" : 119,
"end" : 127,
"text" : "Snowden"
},
{
"start" : 128,
"end" : 138,
"text" : "TFFistifa"
}
]
}
},
{
"_id" : {
"userid" : NumberLong(45666),
"date" : "2013/7/5"
},
"value" : {
"hashtag" : [
{
"start" : 24,
"end" : 44,
"text" : "SıkSöylenenYalanlar"
},
{
"start" : 45,
"end" : 60,
"text" : "ZimmermanTrial"
},
{
"start" : 61,
"end" : 84,
"text" : "ZaynMalikYouArePerfect"
},
{
"start" : 85,
"end" : 99,
"text" : "TrayvonMartin"
},
{
"start" : 100,
"end" : 110,
"text" : "Wimbledon"
},
{
"start" : 111,
"end" : 118,
"text" : "Футбол"
},
{
"start" : 119,
"end" : 127,
"text" : "Snowden"
},
{
"start" : 128,
"end" : 138,
"text" : "TFFistifa"
}
]
}
},
但我只想保留主题标签的 text 元素。我试图将 reducer 更改为 values[idx].hashtag.text 或 values[idx].hashtag["text"] 没有帮助。
更新: 我怀疑我的问题类似于MapReduce problem,但我不知道如何解决我的问题
【问题讨论】: