从 mongodb 集合中提取所有 _id 的最佳方法答案

【问题标题】：Best way to extract all _id from mongodb collection从 mongodb 集合中提取所有 _id 的最佳方法
【发布时间】：2013-03-22 19:48:06
【问题描述】：

从 mongodb 集合中提取所有 _id 的最佳方法是什么？我正在使用 pymongo 与 mongodb 一起工作。以下代码：

for item in db.some_collection.find({}, {'_id': 1}):
    # do something

需要一些时间来遍历所有集合。我只需要 _id 值，它们都应该适合内存。为什么这段代码没有立即完成？

【问题讨论】：

标签： python mongodb indexing pymongo

【解决方案1】：

使用distinct:

some_collection.distinct('_id')

In [5]: c = pymongo.connection.Connection('127.0.0.1')

In [6]: c['test']['test'].insert({'a': 2})
Out[6]: ObjectId('5159c8e9d286da0efccb7b70')

In [7]: c['test']['test'].insert({'a': 3})
Out[7]: ObjectId('5159c8ecd286da0efccb7b71')

In [8]: c['test']['test'].insert({'a': 5})
Out[8]: ObjectId('5159c8edd286da0efccb7b72')

In [9]: c['test']['test'].distinct('_id')
Out[9]:
[ObjectId('5159c8e9d286da0efccb7b70'),
 ObjectId('5159c8ecd286da0efccb7b71'),
 ObjectId('5159c8edd286da0efccb7b72')]

【讨论】：

谢谢。它工作得非常快： import pymongo import time db = pymongo.Connection()['moviedb'] start = time.time() movie_ids = db.kp_movie.distinct('_id') print len(movie_ids) print 'Distinct: % .3f' % (time.time() - start) start = time.time() movie_ids = [] for movie in db.kp_movie.find({}, {'_id': 1}): movie_ids.append(movie ['_id']) print len(movie_ids) print 'Find: %.3f' % (time.time() - start) result: $ python test.py 451943 Distinct: 1.256 451943 Find: 29.083 对不起，我没有知道为什么所有的行都被合并了。