Spring Data MongoDB - 使用 Pageable 查找数百万个数据 - 内存过载？答案

【问题标题】：Spring Data MongoDB - Find millions of data with Pageable - Memory overload?Spring Data MongoDB - 使用 Pageable 查找数百万个数据 - 内存过载？
【发布时间】：2021-04-28 10:27:18
【问题描述】：

我正在使用 Spring Data MongoDB，并且我有这个简单的存储库：

@Repository
public interface TracksRepository extends MongoRepository<Track, String> {

}

我正在使用Pageable 获取我的曲目，就像这样tracksRepository.findAll(PageRequest.of(0,100))

例如，如果我有 1 亿首曲目会怎样？

为了让它们被分页，它们是否都会被加载到内存中（可能会破坏我的服务器）？

我问这个是因为我看到 SpringDataMongo 在内部使用这个code：

@Override
public <S extends T> Page<S> findAll(final Example<S> example, Pageable pageable) {

    Assert.notNull(example, "Sample must not be null!");
    Assert.notNull(pageable, "Pageable must not be null!");

    Query q = new Query(new Criteria().alike(example)).with(pageable);
    List<S> list = mongoOperations.find(q, example.getProbeType(), entityInformation.getCollectionName());

    return PageableExecutionUtils.getPage(list, pageable,
            () -> mongoOperations.count(q, example.getProbeType(), entityInformation.getCollectionName()));
}

这表明list 首先填充了结果，然后进行了分页？

如果为真，我怎样才能在不使服务器超载的情况下实现高效的大数据查询（带分页）？谢谢。

【问题讨论】：

标签： java mongodb spring-boot spring-data spring-data-mongodb

【解决方案1】：

您误解了代码。

这一行定义了要执行的主要查询：

Query q = new Query(new Criteria().alike(example)).with(pageable);

它已经完成了分页的主要工作：限制结果。

以下表达式仅执行计数查询，以计算元素的总数，但仅当无法从已查询的结果中确定总数时才会这样做。如果它包含的元素少于请求的元素，这是可能的。

PageableExecutionUtils.getPage(list, pageable,
            () -> mongoOperations.count(q, example.getProbeType(), entityInformation.getCollectionName()));

因此，在对数百万个文档进行分页时，没有理由预期会出现任何固有问题。

【讨论】：

问题是当你使用.with(pageable) 时count 方法只返回当前页面中包含的记录数，而不是预期的记录总数（未分页）。奇怪！