【发布时间】:2018-01-04 10:50:12
【问题描述】:
在将文档插入 MongoDB 的集合时,任何人都可以建议如何处理文档大小超过 16MB 的错误。我得到了一些像 GridFS 这样的解决方案。通过使用 GridsFS 可以解决这个问题,但我需要一个不使用 GridFS 的解决方案。有什么方法可以使文档变小或拆分为子文档。如果是,我们如何实现?
from pymongo import MongoClient
conn = MongoClient("mongodb://sample_mongo:27017")
db_conn = conn["test"]
db_collection = db_conn["sample"]
# the size of record is 23MB
record = { \
"name": "drugs",
"collection_id": 23,
"timestamp": 1515065002,
"tokens": [], # contains list of strings
"tokens_missing": [], # contains list of strings
"token_mapping": {} # Dictionary contains transformed tokens
}
db_collection.insert(record, check_keys=False)
我收到错误DocumentTooLarge:BSON 文档太大。在 MongoDB 中,最大 BSON 文档大小为 16 兆字节。
File "/usr/local/lib/python2.7/dist-packages/pymongo-3.5.1-py2.7-linux-x86_64.egg/pymongo/collection.py", line 2501, in insert
check_keys, manipulate, write_concern)
File "/usr/local/lib/python2.7/dist-packages/pymongo-3.5.1-py2.7-linux-x86_64.egg/pymongo/collection.py", line 575, in _insert
check_keys, manipulate, write_concern, op_id, bypass_doc_val)
File "/usr/local/lib/python2.7/dist-packages/pymongo-3.5.1-py2.7-linux-x86_64.egg/pymongo/collection.py", line 556, in _insert_one
check_keys=check_keys)
File "/usr/local/lib/python2.7/dist-packages/pymongo-3.5.1-py2.7-linux-x86_64.egg/pymongo/pool.py", line 482, in command
self._raise_connection_failure(error)
File "/usr/local/lib/python2.7/dist-packages/pymongo-3.5.1-py2.7-linux-x86_64.egg/pymongo/pool.py", line 610, in _raise_connection_failure
raise error
DocumentTooLarge: BSON document too large (22451007 bytes) - the connected server supports BSON document sizes up to 16793598 bytes.
【问题讨论】:
-
欢迎来到
Stack Overflow,请在提问时更具体一点:到目前为止,您对代码示例做了哪些尝试? (I downvoted because there is no code) / 您有什么期望? / 您会遇到什么错误? 请查看“How to ask”以获得帮助 -
Hille 更新了我尝试过的代码并指定了错误。谢谢。
-
找出是什么文档字段使它如此之大(tokens、tokens_missing?),将其作为包含原始文档引用的文档存储在单独的集合中。
标签: mongodb python-2.7