如何在mongodb中批量获取数据答案

【问题标题】：how to get data in batches in mongodb如何在mongodb中批量获取数据
【发布时间】：2015-08-05 08:04:32
【问题描述】：

我想从 MongoDB 中检索数据，一次 5 个

我正在使用限制来限制返回的记录数

router.post('/List', function (req, res) {
    var db = req.db;
    var collection = db.get('clnName');
    collection.find({}, { limit: 5 * req.body.requestCount }, function (e, docs) {
        res.json(docs);
    });
});

在这里，我从客户端递增 requestCount 变量，以便获得 5 的倍数的数据。我想要实现的是在第一个请求中获取前 5 个数据，在第二个请求中获取接下来的 5 个数据，但是发生的是，我得到前 5 个数据，然后然后是前 10 个数据。

我应该做些什么改变来实现我的需要？

使用batch size in mongo cursor methods 会解决我的问题吗？

【问题讨论】：

使用光标的skip()方法。
你的意思是 .skip(5*requestCount) 然后限制为 5 ？
是的，没错。

标签： mongodb mongodb-query monk

【解决方案1】：

这里一个明显的例子是使用.skip() 作为修饰符以及.limit() 以实现数据的“分页”：

    collection.find({}, { "limit": 5, "skip": 5 * req.body.requestCount  }, function

但如果你只是批量处理，最好过滤掉你已经看到的范围。 _id 字段为此提供了一个很好的标识符，无需其他排序。所以在第一个请求：

var lastSeen = null;
    collection.find(
        {}, 
        { "limit": 5, "sort": { "_id": 1}  },
        function(err,docs) {
           docs.forEach(function(doc) {
               // do something
               lastSeen = doc._id;        // keep the _id
           });
        }
    );

下次将“lastSeen”存储在会话变量（或其他仅处理批处理的循环结构）中之后：

    collection.find(
        { "_id": { "$gt": lastSeen }, 
        { "limit": 5, "sort": { "_id": 1}  },
        function(err,docs) {
           docs.forEach(function(doc) {
               // do something
               lastSeen = doc._id;        // keep the _id
           });
        }
    );

因此排除所有结果，小于看到的最后一个 _id 值。

使用其他排序这仍然是可能的，但您需要注意最后看到的_id 和最后排序的值。自上次值更改以来，还将 _id 视为列表。

    var lastSeenIds = [],
        lastSeenValue = null;    

    collection.find(
        {}, 
        { "limit": 5, "sort": { "other": 1, "_id": 1 }  },
        function(err,docs) {
           docs.forEach(function(doc) {
               // do something
               if ( lastSeenValue != doc.other ) {  // clear on change
                   lastSeenValue = doc.other;
                   lastSeenIds = [];
               }
               lastSeenIds.push(doc._id);     // keep a list
           });
        }
    );

然后在您的下一次迭代中使用变量：

    collection.find(
        { "_id": { "$nin": lastSeenIds }, "other": { "$gte": lastSeenValue } },
        { "limit": 5, "sort": { "other": 1, "_id": 1 }  },
        function(err,docs) {
           docs.forEach(function(doc) {
               // do something
               if ( lastSeenValue != doc.other ) {  // clear on change
                   lastSeenValue = doc.other;
                   lastSeenIds = [];
               }
               lastSeenIds.push(doc._id);     // keep a list
           });
        }
    );

这比“跳过”匹配基本查询条件的结果要高效得多。

【讨论】：

很好的答案。如果有人在过滤 id 时遇到问题，请确保您的 id 是 Mongo ObjectID，而不是字符串。例如："_id": { "$lte": ObjectId('xxxxxxxx') }。参考自：stackoverflow.com/a/11052397/10524266