无法在 elasticsearch-hadoop 上设置 _id答案

【问题标题】：Unable to set _id on elasticsearch-hadoop无法在 elasticsearch-hadoop 上设置 _id
【发布时间】：2016-12-24 20:14:40
【问题描述】：

我正在尝试从 rdd 写入 elasticsearch（pyspark，python 3.5）。我能够正确地编写 json 的主体，但 elasticsearch 不是采用我的 _id，而是创建它自己的。

我的代码：

class Article:
    def __init__(self, title, text, text2):
        self.id_ = title
        self.text = text
        self.text2 = text2

if __name__ == '__main__':

    pt=_sc.parallelize([Article("rt", "ted", "ted2"),Article("rt2", "ted2", "ted22")])
        save=pt.map(lambda item:
        (item.id_,
            {
            'text' : item.text,
            'text2' : item.text2
            }
        ))

        es_write_conf = {
            "es.nodes": "localhost",
            "es.port": "9200",
            "es.resource": 'db/table1'
        }
        save.saveAsNewAPIHadoopFile(
            path='-',
            outputFormatClass="org.elasticsearch.hadoop.mr.EsOutputFormat",
            keyClass="org.apache.hadoop.io.NullWritable",
            valueClass="org.elasticsearch.hadoop.mr.LinkedMapWritable",
            conf=es_write_conf)

程序跟踪： link to the image

【问题讨论】：

标签： python-3.x hadoop elasticsearch pyspark

【解决方案1】：

这是对索引的映射设置，你可以在官方用户指南中找到。
示例代码如下：

curl -XPOST localhost:9200/test -d '{
    "settings" : {
        "number_of_shards" : 1,
        "number_of_replicas":0
    },
    "mappings" : {
        "test1" : {
            "_id":{"path":"mainkey"},
            "_source" : { "enabled" : false },
            "properties" : {
                "mainkey" : { "type" : "string", "index" : "not_analyzed" }
            }
        }
    }
}'

【讨论】：