【发布时间】:2020-09-21 14:30:40
【问题描述】:
我有一个文本文件(>= 60Gig),里面的记录是这样的:
{"index": {"_type": "_doc", "_id": "bLcy4m8BAObvGO9GALME"}}
{"message":"{\"_\":\"user\",\"pFlags\":{\"contact\":true},\"flags\":2135,\"id\":816704468,\"access_hash\":\"788468819702098896\",\"first_name\":\"a\",\"last_name\":\"b\",\"phone\":\"123\",\"status\":{\"_\":\"userStatusOffline\",\"was_online\":132}}","phone":"12","@version":"1","typ":"telegram_contacts","access_hash":"123","id":816704468,"@timestamp":"2020-01-26T13:53:29.467Z","path":"/home/user/mirror_01/users_5d6ca02e7e736a7fc700df8c.log","type":"redis","flags":2135,"host":"ubuntu","imported_from":"telegram_contacts"}
{"index": {"_type": "_doc", "_id": "Z7cy4m8BAObvGO9GALME"}}
{"message":"{\"_\":\"user\",\"pFlags\":{\"contact\":true},\"flags\":2143,\"id\":323586643,\"access_hash\":\"8315858910992970114\",\"first_name\":\"bv\",\"last_name\":\"nj\",\"username\":\"kj\",\"phone\":\"123\",\"status\":{\"_\":\"userStatusRecently\"}}","phone":"123","@version":"1","typ":"telegram_contacts","access_hash":"8315858910992970114","id":323586643,"@timestamp":"2020-01-26T13:53:29.469Z","path":"/home/user/mirror_01/users_5d6ca02e7e736a7fc700df8c.log","username":"mbnab","type":"redis","flags":2143,"host":"ubuntu","imported_from":"telegram_contacts"}
对此我有几个问题:
- 这是一个有效的 JSON 文件吗?
- python 可以处理这种大小的文件吗?还是应该以某种方式将其转换为 Access 或 Excel 文件?
这些是我发现有用的一些 SO 帖子:
- Is there a memory efficient and fast way to load big json files in python?
- Reading rather large json files in Python
但仍需要帮助。
【问题讨论】:
-
我不认为那是 json。
-
@Yatin 我也是。你认为我应该为此做一个模板吗?
-
"message" 的值是否只是保存为字符串并带有"转义"的json?
-
@MichaelC 是的,先生。这是准确的记录。
-
使用 python 可以处理这个大小的文件....但是你想用它做什么呢?你试过什么?
标签: python text-processing python-textprocessing