从大型 MongoDB 集合动态生成 Mocha 测试答案

【问题标题】：Dynamically generating Mocha tests from a large MongoDB collection从大型 MongoDB 集合动态生成 Mocha 测试
【发布时间】：2019-05-31 19:31:02
【问题描述】：

在我的 Node.js 项目测试套件中，我想根据 JSON 模式检查 MongoDB 集合中的每个项目。使用Mocha 测试框架，我可以像这样动态生成测试：

describe('Lexemes', () => {
  // load schema validator
  var schema = JSON.parse(fs.readFileSync('public/schemas/lexeme.json'))
  var validate = ajv.compile(schema)

  it('receives data', async () => {
    // load all items in collection
    let items = await db.get('lexemes').find()
    items.forEach((item) => {
      // dynamically generated test for each result
      describe(item._id, () => {
        it('conforms to schema', () => {
          validate(item).should.be.true()
        })
      })
    })
  })
})

这非常适合较小的收藏。但是，对于一个非常大的集合（450 万个文档），我会超时：

Error: Timeout of 2000ms exceeded. For async tests and hooks, ensure "done()" is called; if returning a Promise, ensure it resolves.

如果我只是将超时时间增加到 60 秒，我最终会收到 JavaScript heap out of memory 错误。显然它试图将整个集合放入内存中，但这是行不通的。

我以为我可以像这样使用Monk's result streaming：

it('receives data', () => {
  return db.get('lexemes').find().each((item, { close, pause, resume }) => {
    describe(item._id, () => {
      it('conforms to schema', () => {
        validate(item).should.be.true()
      })
    })
  })
})

但这并没有什么区别（请注意，我也尝试返回一个 Promise 而不是使用 async/await，这也无济于事）。

编辑 1

我尝试使用 Mongo 查询的限制/跳过选项将数据手动分页成更小的块：

const limit = 1000 // page size
var skip = 0
do {
  it(`receives data ${skip} to ${skip + limit - 1}`, async () => {
    let items = await db.get('lexemes').find({}, { limit: limit, skip: skip })
    items.forEach((item) => {
      describe(item._id, () => {
        it('conforms to schema', () => {
          validate(item).should.be.true()
        })
      })
    })
  })
  skip += limit
} while (skip < 5000000)

编辑 2

这避免了超时错误，并且 Mocha 似乎在“接收数据 x 到 y”测试方面取得了进展，但是当它开始执行“符合模式”测试时，它会抛出与上述相同的内存不足错误。

关于我可以尝试的任何其他想法？

【问题讨论】：

标签： node.js mongodb asynchronous mocha.js

【解决方案1】：

这并不能解决最初的问题，但我不得不满足于使用 MongoDB 的 $sample 聚合功能测试我的大型集合中的数据的样本：

const limit = 100000 // sample size
it(`receives data (${limit} samples)`, async () => {
  let items = await db.get('lexemes').aggregate([{ '$sample': { 'size': limit } }])
  items.forEach((item) => {
    describe(item._id}, () => {
      it('conforms to schema', () => {
        validate(item).should.be.true()
      })
    })
 })
})

【讨论】：