如何使用`pysolr`将JSON文件加载到solr？答案

【问题标题】：how to load a JSON file to solr using `pysolr`?如何使用`pysolr`将JSON文件加载到solr？
【发布时间】：2017-04-20 09:58:06
【问题描述】：

以下 python 代码添加了一个文档，但没有 json 内容：

solr_instance = pysolr.Solr('http://192.168.45.153:8983/solr/test', timeout=60)
json_filename = '/path/to/file/test.json'
argws = {
    'commit': 'true',
    'extractOnly': False,
    'Content-Type': 'application/json',
}
with open(json_filename, 'rb') as f:
    solr_instance.extract(f, **argws)
    solr_instance.commit()

在命令行中使用curl 可以正常工作：

$ curl 'http://192.168.45.153:8983/solr/test/update?commit=true' \
     --data-binary @/path/to/file/test.json \
     -H 'Content-Type: application/json'

文件有以下内容：

$ cat /cygdrive/w/mist/test.json
-->    [{"x": "a","y": "b"}]

我正在使用pysolr 3.6.0 和solr 6.5.0

【问题讨论】：

标签： solr pysolr

【解决方案1】：

extract() 方法是指针对 ExtractingRequestHandler 发出的请求，该请求旨在用于从富文档（例如 PDF 等）中提取内容。

您可以使用常规的.add 方法将解码后的 JSON 提交给 Solr：

import json

solr.add(json.load(json_filename))

.. 应该可以工作。

【讨论】：

这行得通。但我收到错误Document contains at least one immense term in field="x" (whose UTF8 encoding is longer than the max length 32766), all of which were skipped.（我在示例中显示了一个小文档）
为新问题打开一个新问题，但这可能与长度超过 32766 的 string 字段有关（因为字符串字段被索引为单个术语）。