MongoDB - 分页答案

【问题标题】：MongoDB - pagingMongoDB - 分页
【发布时间】：2011-02-19 09:31:31
【问题描述】：

在使用 MongoDB 时，是否有任何特殊的制作模式，例如分页视图？比如说一个列出 10 个最新帖子的博客，您可以在其中导航到旧帖子。

或者用一个索引来解决它，例如blogpost.publishdate 并跳过并限制结果？

【问题讨论】：

我将把这个挂起来，因为对于制作这个比例的正确方法似乎存在一些分歧。

标签： mongodb paging

【解决方案1】：

当性能有问题或有大量集合时，使用 skip+limit 不是进行分页的好方法；随着页码的增加，它会变得越来越慢。使用 skip 需要服务器遍历从 0 到偏移（跳过）值的所有文档（或索引值）。

在传入最后一页的范围值时，最好使用范围查询（+ 限制）。例如，如果您按“发布日期”排序，您只需将最后一个“发布日期”值作为查询条件传递给获取下一页数据。

【讨论】：

很高兴看到一些文档确认跳过 mongodb 遍历所有文档。
你去：skip docs 如果还有其他地方需要更新信息，请告诉我。
@ScottHernandez：我使用指向多个页面的链接进行分页（例如：页面：第一、2、3、4、5、最后）并在所有字段上进行排序。我的字段中只有一个是唯一的（并且已编入索引），范围查询是否适用于此用例？恐怕不是，我只是想确认这是否可能。谢谢。
这里是skip docs link
如果有多个文档具有相同的发布日期值，这似乎不起作用。

【解决方案2】：

如果您需要以多种方式对项目进行排序，则很难实现基于范围的分页。
请记住，如果排序参数的字段值不是唯一的，那么基于范围的分页将变得不可靠。

可能的解决方案：尝试简化设计，考虑是否只能按 id 或某个唯一值排序？

如果我们可以，那么可以使用基于范围的分页。

常用的方法是使用 sort() 、 skip() 和 limit() 来实现分页。

【讨论】：

一篇不错的 Python 代码示例文章可以在这里找到codementor.io/arpitbhayani/…
谢谢 - 很好的回答！当人们建议使用过滤器进行分页时，我很生气，例如{ _id: { $gt: ... } }... 如果使用自定义订购，它根本不起作用 - 例如.sort(...).
@NickGrealy 我遵循了一个教程来做到这一点，我现在处于分页“看起来”可行但我丢失文档的情况，因为我使用的是 mongo ID，但随着新数据的获取插入到数据库中，然后如果起始页面包含以 A 开头的记录但 ID 高于以 AA 开头的记录，则集合按字母顺序排序，因为它们是在之后插入的，然后分页不会返回 AA 记录。跳过和限制合适吗？我有大约 6000 万份文档可供搜索。
@berimbolo - 这是值得讨论的 - 你不会在 cmets 中得到你的答案。问题：你期望什么行为？您正在使用实时系统，记录一直在创建和删除。如果您为每次新页面加载重新请求数据的实时快照，那么您应该期望您的基础数据发生变化。应该是什么行为？如果您使用“时间点”数据快照，您将拥有“固定页面”，但您也将拥有“过期”数据。您描述的问题有多大，人们多久遇到一次？
这绝对值得一谈，我的问题是我按车牌的字母顺序检索了一个一次性文件，并且每 15 分钟将更新应用于更改（删除或添加）的车牌，问题是如果添加了一个新盘子，例如它以 A 开头，由于页面大小是页面上的最后一个，然后如果请求下一个，则不会返回任何记录我相信（假设和人为的例子，但说明了我的问题）因为ID 高于集合中的任何其他 ID。我正在考虑使用完整的车牌来驱动查询的大部分内容。

【解决方案3】：

当我的集合变得太大而无法在单个查询中返回时，这是我使用的解决方案。它利用 _id 字段的固有顺序，并允许您按指定的批量大小循环遍历集合。

这里是一个 npm 模块，mongoose-paging，完整代码如下：

function promiseWhile(condition, action) {
  return new Promise(function(resolve, reject) {
    process.nextTick(function loop() {
      if(!condition()) {
        resolve();
      } else {
        action().then(loop).catch(reject);
      }
    });
  });
}

function findPaged(query, fields, options, iterator, cb) {
  var Model  = this,
    step     = options.step,
    cursor   = null,
    length   = null;

  promiseWhile(function() {
    return ( length===null || length > 0 );
  }, function() {
    return new Promise(function(resolve, reject) {

        if(cursor) query['_id'] = { $gt: cursor };

        Model.find(query, fields, options).sort({_id: 1}).limit(step).exec(function(err, items) {
          if(err) {
            reject(err);
          } else {
            length  = items.length;
            if(length > 0) {
              cursor  = items[length - 1]._id;
              iterator(items, function(err) {
                if(err) {
                  reject(err);
                } else {
                  resolve();
                }
              });
            } else {
              resolve();
            }
          }
        });
      });
  }).then(cb).catch(cb);

}

module.exports = function(schema) {
  schema.statics.findPaged = findPaged;
};

像这样将它附加到您的模型上：

MySchema.plugin(findPaged);

然后像这样查询：

MyModel.findPaged(
  // mongoose query object, leave blank for all
  {source: 'email'},
  // fields to return, leave blank for all
  ['subject', 'message'],
  // number of results per page
  {step: 100},
  // iterator to call on each set of results
  function(results, cb) {
    console.log(results);
    // this is called repeatedly while until there are no more results.
    // results is an array of maximum length 100 containing the
    // results of your query

    // if all goes well
    cb();

    // if your async stuff has an error
    cb(err);
  },
  // function to call when finished looping
  function(err) {
    throw err;
    // this is called once there are no more results (err is null),
    // or if there is an error (then err is set)
  }
);

【讨论】：

不知道为什么这个答案没有更多的赞成票。这是一种比跳过/限制更有效的分页方式
我也来过这个包，但它的性能与跳过/限制相比如何以及@Scott Hernandez 提供的答案？
这个答案如何工作，在任何其他领域进行排序？

【解决方案4】：

基于范围的分页是可行的，但您需要聪明地了解如何最小/最大查询。

如果您负担得起，您应该尝试将查询结果缓存在临时文件或集合中。感谢 MongoDB 中的 TTL 集合，您可以将结果插入到两个集合中。

搜索+用户+参数查询（TTL随便）
查询结果（TTL 任意 + 清洗间隔 + 1）

同时使用两者可确保当 TTL 接近当前时间时不会得到部分结果。您可以在存储结果时使用一个简单的计数器来执行非常简单的范围查询。

【讨论】：

【解决方案5】：

这是一个使用官方 C# 驱动程序按CreatedDate（其中pageIndex 是从零开始）检索User 文档列表的示例。

public void List<User> GetUsers() 
{
  var connectionString = "<a connection string>";
  var client = new MongoClient(connectionString);
  var server = client.GetServer();
  var database = server.GetDatabase("<a database name>");

  var sortBy = SortBy<User>.Descending(u => u.CreatedDate);
  var collection = database.GetCollection<User>("Users");
  var cursor = collection.FindAll();
  cursor.SetSortOrder(sortBy);

  cursor.Skip = pageIndex * pageSize;
  cursor.Limit = pageSize;
  return cursor.ToList();
}

所有的排序和分页操作都是在服务器端完成的。虽然这是 C# 中的一个示例，但我想同样可以应用于其他语言端口。

见http://docs.mongodb.org/ecosystem/tutorial/use-csharp-driver/#modifying-a-cursor-before-enumerating-it。

【讨论】：

【解决方案6】：

    // file:ad-hoc.js
    // an example of using the less binary as pager in the bash shell
    //
    // call on the shell by:
    // mongo localhost:27017/mydb ad-hoc.js | less
    //
    // note ad-hoc.js must be in your current directory
    // replace the 27017 wit the port of your mongodb instance
    // replace the mydb with the name of the db you want to query
    //
    // create the connection obj
    conn = new Mongo();

    // set the db of the connection
    // replace the mydb with the name of the db you want to query
    db = conn.getDB("mydb");

    // replace the products with the name of the collection
    // populate my the products collection
    // this is just for demo purposes - you will probably have your data already
    for (var i=0;i<1000;i++ ) {
    db.products.insert(
        [
            { _id: i, item: "lamp", qty: 50, type: "desk" },
        ],
        { ordered: true }
    )
    }


    // replace the products with the name of the collection
    cursor = db.products.find();

    // print the collection contents
    while ( cursor.hasNext() ) {
        printjson( cursor.next() );
    }
    // eof file: ad-hoc.js

【讨论】：