【问题标题】:Getting "Failed to build synonyms" message when trying to build synonyms filter尝试构建同义词过滤器时收到“无法构建同义词”消息
【发布时间】:2020-01-29 20:32:57
【问题描述】:

我正在使用 Elasticsearch 6.8 和 python 3.7

我正在尝试创建自己的同义词,将表情符号称为文本。 例如:“:-)”将指代“happy-smiley”。

我正在尝试使用以下代码构建和创建同义词和索引:

def create_analyzer(es_api, index_name, doc_type):
    body = {
        "settings": {
                "index": {
                    "analysis": {
                        "filter": {
                            "synonym_filter": {
                                "type": "synonym",
                                "synonyms": [
                                    ":-), happy-smiley",
                                    ":-(, sad-smiley"
                                ]
                            }
                        },
                        "analyzer": {
                            "synonym_analyzer": {
                                "tokenizer": "standard",
                                "filter": ["lowercase", "synonym_filter"]
                            }
                        }
                    }
                }
            },
        "mappings": {
            doc_type: {
                "properties": {
                    "tweet": {"type": "text", "fielddata": "true"},
                    "existence": {"type": "text"},
                    "confidence": {"type": "float"}
                }
            }}
    }
    res = es_api.indices.create(index=index_name, body=body)

但我遇到了错误:

lasticsearch.exceptions.RequestError: RequestError(400, 'illegal_argument_exception', 'failed to build synonyms')

出了什么问题,我该如何解决?

【问题讨论】:

    标签: elasticsearch


    【解决方案1】:

    我可以说你出了什么问题,(更新)如何解决这个问题。

    因此,如果您将在开发工具或 bu cURL 中运行此查询,您将看到错误原因 - 认为 Python 切割错误详细信息,因此您看不到原因。

    PUT st_t3
    {
      "settings": {
        "index": {
          "analysis": {
            "filter": {
              "synonym_filter": {
                "type": "synonym",
                "synonyms": [
                  ":-), happy-smiley",
                  ":-(, sad-smiley"
                ]
              }
            },
            "analyzer": {
              "synonym_analyzer": {
                "tokenizer": "standard",
                "filter": [
                  "lowercase",
                  "synonym_filter"
                ]
              }
            }
          }
        }
      },
      "mappings": {
        "properties": {
          "tweet": {
            "type": "text",
            "fielddata": "true"
          },
          "existence": {
            "type": "text"
          },
          "confidence": {
            "type": "float"
          }
        }
      }
    }
    

    回复:

    {
      "error": {
        "root_cause": [
          {
            "type": "remote_transport_exception",
            "reason": "[127.0.0.1:9301][indices:admin/create]"
          }
        ],
        "type": "illegal_argument_exception",
        "reason": "failed to build synonyms",
        "caused_by": {
          "type": "parse_exception",
          "reason": "parse_exception: Invalid synonym rule at line 1",
          "caused_by": {
            "type": "illegal_argument_exception",
            "reason": "term: :-) was completely eliminated by analyzer"
          }
        }
      },
      "status": 400
    }
    

    所以"reason": "term: :-) was completely eliminated by analyzer" 的原因 - 意味着 Elastic 在同义词过滤器中不支持此字符。

    更新

    可以通过char_filter过滤器完成。

    例子:

    PUT st_t3
    {
      "settings": {
        "index": {
          "analysis": {
            "char_filter": {
              "happy_filter": {
                "type": "mapping",
                "mappings": [
                  ":-) => happy-smiley",
                  ":-( => sad-smiley"
                ]
              }
            },
            "analyzer": {
              "smile_analyzer": {
                "type": "custom",
                "char_filter": [
                  "happy_filter"
                ],
                "tokenizer": "standard",
                "filter": [
                  "lowercase"
                ]
              }
            }
          }
        }
      },
      "mappings": {
        "properties": {
          "tweet": {
            "type": "text",
            "fielddata": "true"
          },
          "existence": {
            "type": "text"
          },
          "confidence": {
            "type": "float"
          }
        }
      }
    }
    

    测试

    POST st_t3/_analyze
    {
      "text": ":-) test",
      "analyzer": "smile_analyzer"
    }
    

    回答

    {
      "tokens" : [
        {
          "token" : "happy",
          "start_offset" : 0,
          "end_offset" : 2,
          "type" : "<ALPHANUM>",
          "position" : 0
        },
        {
          "token" : "smiley",
          "start_offset" : 2,
          "end_offset" : 3,
          "type" : "<ALPHANUM>",
          "position" : 1
        },
        {
          "token" : "test",
          "start_offset" : 4,
          "end_offset" : 8,
          "type" : "<ALPHANUM>",
          "position" : 2
        }
      ]
    }
    

    【讨论】:

      猜你喜欢
      • 2023-03-22
      • 1970-01-01
      • 1970-01-01
      • 2019-05-10
      • 2013-06-21
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多