【发布时间】:2021-01-11 03:01:41
【问题描述】:
我收到了一个 15 GB 的 .txt 文件,格式如下:
{
"_score": 1.0,
"_index": "newsvit",
"_source": {
"content": " \u0641\u0647\u06cc\u0645\u0647 \u062d\u0633\u0646\u200c\u0645\u06cc\u0631\u06cc: ",
"title": "\u06a9\u0627\u0631\u0647\u0627\u06cc \u0642\u0627\u0644\u06cc\u0628\u0627\u0641 ",
"lead": "\u062c\u0627\u0645\u0639\u0647 > \u0634\u0647\u0631\u06cc -
\u0645\u06cc\u0632\u06af\u0631\u062f\u06cc \u062f\u0631\u0628\u0627\u0631\u0647 .",
"agency": "13",
"date_created": 1494518193,
"url": "http://www.khabaronline.ir/(X(1)S(bud4wg3ebzbxv51mj45iwjtp))/detail/663749/society/urban",
"image": "uploads/2017/05/11/1589793661.jpg",
"category": "15"
},
"_type": "news",
"_id": "2981643"
}
{
"_score": 1.0,
"_index": "newsvit",
"_source": {
"content": "\u0645/\u0630",
"title": "\u0645\u0639\u0646\u0648\u06cc\u062a \u062f\u0631 \u0639\u0635\u0631 ",
"lead": "\u0645\u062f\u06cc\u0631 \u0645\u0624\u0633\u0633\u0647 \u0639\u0644\u0645\u06cc \u0648 \u067e\u0698\u0648\u0647\u0634\u06cc \u0627\u0628\u0646\u200c\u0633\u06cc\u0646\u0627 \u062f\u0631 .",
"agency": "1",
"date_created": 1494521817,
"url": "http://www.farsnews.com/13960221001386",
"image": "uploads/2017/05/11/1713799235.jpg",
"category": "20"
},
"_type": "news",
"_id": "2981951"
}
....
我想将它导入elasticsearch。我尝试过 BulkAPI,但由于它只接受特定样式的 JSON,我无法将整个 15 GB 文件转换为 Bulk 格式。我也尝试过 logstash,但像 content 这样的字段将无法搜索和查询。
将此文件导入elasticsearch最有效的方法是什么?
【问题讨论】:
标签: json elasticsearch