Groovy gmongo 批处理答案

【问题标题】：Groovy gmongo batch processingGroovy gmongo 批处理
【发布时间】：2019-01-21 12:53:23
【问题描述】：

我目前正在尝试使用 Gmongo 驱动程序在 groovy 中运行批处理作业，该集合大约 8 gigs 我的问题是我的脚本试图将所有内容加载到内存中，理想情况下我希望能够处理这在批处理中类似于 Spring Boot Batch 所做的，但在 groovy 脚本中

我已经尝试过 batchSize()，但这个函数仍然会将整个集合检索到内存中，只是为了将其应用于我的批处理逻辑。

这是我的例子

momngoDb.collection.find().collect() it -> {
  //logic
}

【问题讨论】：

标签： java mongodb groovy batch-processing gmongo

【解决方案1】：

根据官方文档：

https://docs.mongodb.com/manual/tutorial/iterate-a-cursor/#read-operations-cursors

def myCursor = db.collection.find()

while (myCursor.hasNext()) {
   print( myCursor.next() }
}

【讨论】：

【解决方案2】：

经过深思熟虑，我发现这个解决方案效果最好，原因如下。

与 Cursor 不同，它不会检索单个文档进行处理（这可能非常慢）
与 Gmongo 批处理功能不同，它也不会尝试将整个集合上传到内存中，而只是将其分批进行处理，这往往会占用机器资源。

下面的代码效率高且资源少，具体取决于您的批量大小。

def skipSize = 0
def limitSize = Integer.valueOf(1000) batchSize (if your going to hard code the batch size then you dont need the int convertion)
def dbSize = Db.collectionName.count()

def dbRunCount = (dbSize / limitSize).round()

dbRunCount.times { it ->
    dstvoDsEpgDb.schedule.find()
            .skip(skipSize)
            .limit(limitSize)
            .collect { event ->
            //run your business logic processing
            }

    //calculate the next skipSize   
    skipSize += limitSize

}

【讨论】：