【问题标题】:Validation Failed: 1: no requests added in bulk indexing ElasticSearch验证失败:1:批量索引 ElasticSearch 中未添加请求
【发布时间】:2021-02-14 03:11:45
【问题描述】:

我有一个 JSON 文件,我需要在 ElasticSearch 服务器上对其进行索引。

JSOIN 文件如下所示:

{
    "sku": "1",
    "vbid": "1",
    "created": "Sun, 05 Oct 2014 03:35:58 +0000",
    "updated": "Sun, 06 Mar 2016 12:44:48 +0000",
    "type": "Single",
    "downloadable-duration": "perpetual",
    "online-duration": "365 days",
    "book-format": "ePub",
    "build-status": "In Inventory",
    "description": "On 7 August 1914, a week before the Battle of Tannenburg and two weeks before the Battle of the Marne, the French army attacked the Germans at Mulhouse in Alsace. Their objective was to recapture territory which had been lost after the Franco-Prussian War of 1870-71, which made it a matter of pride for the French. However, after initial success in capturing Mulhouse, the Germans were able to reinforce more quickly, and drove them back within three days. After forty-three years of peace, this was the first test of strength between France and Germany. In 1929 Karl Deuringer wrote the official history of the battle for the Bavarian Army, an immensely detailed work of 890 pages; First World War expert and former army officer Terence Zuber has translated this study and edited it down to more accessible length, to produce the first account in English of the first major battle of the First World War.",
    "publication-date": "07/2014",
    "author": "Deuringer, Karl",
    "title": "The First Battle of the First World War: Alsace-Lorraine",
    "sort-title": "First Battle of the First World War: Alsace-Lorraine",
    "edition": "0",
    "sampleable": "false",
    "page-count": "0",
    "print-drm-text": "This title will only allow printing of 2 consecutive pages at a time.",
    "copy-drm-text": "This title will only allow copying of 2 consecutive pages at a time.",
    "kind": "book",
    "fro": "false",
    "distributable": "true",
    "subjects": {
      "subject": [
        {
          "-schema": "bisac",
          "-code": "HIS027090",
          "#text": "World War I"
        },
        {
          "-schema": "coursesmart",
          "-code": "cs.soc_sci.hist.milit_hist",
          "#text": "Social Sciences -> History -> Military History"
        }
      ]
    },   
   "pricelist": {
      "publisher-list-price": "0.0",
      "digital-list-price": "7.28"
    },
    "publisher": {
      "publisher-name": "The History Press",
      "imprint-name": "The History Press Ireland"
    },
    "aliases": {
      "eisbn-canonical": "1",
      "isbn-canonical": "1",
      "print-isbn-canonical": "9780752460864",
      "isbn13": "1",
      "isbn10": "0750951796",
      "additional-isbns": {
        "isbn": [
          {
            "-type": "print-isbn-10",
            "#text": "0752460862"
          },
          {
            "-type": "print-isbn-13",
            "#text": "97807524608"
          }
        ]
      }
    },
    "owner": {
      "company": {
        "id": "1893",
        "name": "The History Press"
      }
    },
    "distributor": {
      "company": {
        "id": "3658",
        "name": "asc"
      }
    }
  }

但是当我尝试使用命令索引这个 JSON 文件时

curl -XPOST 'http://localhost:9200/_bulk' -d @1.json

我收到此错误:

{"error":{"root_cause":[{"type":"action_request_validation_exception","reason":"Validation Failed: 1: no requests added;"}],"type":"action_request_validation_exception","reason":"Validation Failed: 1: no requests added;"},"status":400}

我不知道我在哪里犯了错误。

【问题讨论】:

    标签: json elasticsearch elasticsearch-bulk-api


    【解决方案1】:

    Elasticsearch 的批量 API 使用一种特殊的语法,实际上是由单行编写的 json 文档组成。看看documentation

    语法非常简单。对于索引、创建和更新,您需要 2 个单行 json 文档。第一行告诉操作,第二行将文档提供给索引/创建/更新。要删除文档,只需要操作行。例如(来自文档):

    { "index" : { "_index" : "test", "_type" : "type1", "_id" : "1" } }
    { "field1" : "value1" }
    { "create" : { "_index" : "test", "_type" : "type1", "_id" : "3" } }
    { "field1" : "value3" }
    { "update" : {"_id" : "1", "_type" : "type1", "_index" : "index1"} }   
    { "doc" : {"field2" : "value2"} }
    { "delete" : { "_index" : "test", "_type" : "type1", "_id" : "2" } }
    

    不要忘记以新行结束文件。 然后,调用批量 api 使用命令:

    curl -s -XPOST localhost:9200/_bulk --data-binary "@requests"
    

    来自文档:

    如果您向 curl 提供文本文件输入,则必须使用 --data-binary 标志而不是普通的 -d

    【讨论】:

    • "不要忘记以新行结束文件。"谢谢!在这里为我节省了一个小时。
    • 别忘了用新行结束文件.. 凌晨 3 点对着笔记本电脑发誓,你救了我的命,哈哈..
    • 另外,愚蠢的事情......不是我会那样做......文档说--data-binary "@requests" @ 必须在你的文件名之前,如果你忘记它它会失败好吧。
    • 还要注意"update"操作的payload格式必须以"doc"开头。所以{"doc": {"my_field" : "my_value"}}index 操作不是这种情况。
    • --data-binary 标志救了我,谢谢!看起来-d 将切断所有新行
    【解决方案2】:

    添加下一行(如果您在客户端 API 中使用 json 作为正文,则输入邮递员或“\n”)完成了我的工作

    【讨论】:

      【解决方案3】:

      我有一个类似的问题,我想删除特定类型的特定文档,通过上面的答案,我终于让我的简单 bash 脚本工作了!

      我有一个每行都有一个 document_id 的文件 (document_id.txt),使用下面的 bash 脚本,我可以删除带有上述 document_id 的特定类型的文档。

      这是文件的样子:

      c476ce18803d7ed3708f6340fdfa34525b20ee90
      5131a30a6316f221fe420d2d3c0017a76643bccd
      08ebca52025ad1c81581a018febbe57b1e3ca3cd
      496ff829c736aa311e2e749cec0df49b5a37f796
      87c4101cb10d3404028f83af1ce470a58744b75c
      37f0daf7be27cf081e491dd445558719e4dedba1
      

      bash 脚本如下所示:

      #!/bin/bash
      
      es_cluster="http://localhost:9200"
      index="some-index"
      doc_type="some-document-type"
      
      for doc_id in `cat document_id.txt`
      do
          request_string="{\"delete\" : { \"_type\" : \"${doc_type}\", \"_id\" : \"${doc_id}\" } }"
          echo -e "${request_string}\r\n\r\n" | curl -s -XPOST "${es_cluster}/${index}/${doc_type}/_bulk" --data-binary @-
          echo
      done
      

      在经历了很多挫折之后,诀窍是使用 -e 选项来回显并将 \n\n 附加到 echo 的输出,然后再将其输入卷曲。

      然后在 curl 中,我设置了 --data-binary 选项以阻止它删除 _bulk\n\n /strong> 端点后跟 @- 选项以使其从标准输入读取!

      【讨论】:

        【解决方案4】:

        对我来说这是一个奇怪的错误。我正在创建 bulkRequest 对象并在插入 ElasticSearch 之前将其清除。

        造成问题的那一行。

        bulkRequest.requests().clear();
        

        【讨论】:

          猜你喜欢
          • 2019-09-17
          • 1970-01-01
          • 2019-07-13
          • 1970-01-01
          • 2012-12-09
          • 2023-04-03
          • 1970-01-01
          • 2022-05-31
          • 1970-01-01
          相关资源
          最近更新 更多