【问题标题】:elasticsearch not returning text when entered partial word输入部分单词时elasticsearch不返回文本
【发布时间】:2016-05-05 05:21:29
【问题描述】:

我的分析仪设置如下:

"analyzer": {
    "edgeNgram_autocomplete": {
        "type": "custom",
        "tokenizer": "standard",
        "filter": ["lowercase", "autocomplete"]
    },                
    "full_name": {
        "filter":["standard","lowercase","asciifolding"],
        "type":"custom",
        "tokenizer":"standard"
    }

我的过滤器:

"filter": {
    "autocomplete": {
        "type": "edgeNGram",
        "side":"front",
        "min_gram": 1,
        "max_gram": 50
    } 

名称字段分析器:

"textbox": {
    "_parent": {
        "type": "document"
    },            
    "properties": {
        "text": {
            "fields": {
                "text": {
                    "type":"string",
                    "analyzer":"full_name"
                },
                "autocomplete": {
                    "type": "string",
                    "index_analyzer": "edgeNgram_autocomplete",
                    "search_analyzer": "full_name",
                    "analyzer": "full_name"
                }
            },
            "type":"multi_field"
        }
    }
}

放在一起,组成我的 docstore 索引映射:

PUT http://localhost:9200/docstore
{
    "settings": {
        "analysis": {
            "analyzer": {
                "edgeNgram_autocomplete": {
                    "type": "custom",
                    "tokenizer": "standard",
                    "filter": ["lowercase", "autocomplete"]
                },                
                "full_name": {
                   "filter":["standard","lowercase","asciifolding"],
                   "type":"custom",
                   "tokenizer":"standard"
                }
            },
            "filter": {
                "autocomplete": {
                    "type": "edgeNGram",
                    "side":"front",
                    "min_gram": 1,
                    "max_gram": 50
                }           }
        }
    },
    "mappings": {
        "space": {
            "properties": {
                "name": {
                    "type": "string",
                    "index": "not_analyzed"
                }
            }
        },
        "document": {
            "_parent": {
                "type": "space"
            },
            "properties": {
                "name": {
                    "type": "string",
                    "index": "not_analyzed"
                }
            }
        },
        "textbox": {
            "_parent": {
                "type": "document"
            },            
            "properties": {
                "bbox": {
                    "type": "long"
                },
                "text": {
                    "fields": {
                        "text": {
                            "type":"string",
                            "analyzer":"full_name"
                        },
                        "autocomplete": {
                            "type": "string",
                            "index_analyzer": "edgeNgram_autocomplete",
                            "search_analyzer": "full_name",
                            "analyzer":"full_name"
                        }
                    },
                    "type":"multi_field"
                }
            }
        },
        "entity": {
            "_parent": {
                "type": "document"
            },
            "properties": {
                "bbox": {
                    "type": "long"
                },
                "name": {
                    "type": "string",
                    "index": "not_analyzed"
                }
            }
        }
    }
}

添加一个空格来存放所有文档:

POST http://localhost:9200/docstore/space
{
    "name": "Space 1"
}

当用户输入单词时:proj

这应该返回,所有文本:

  • 示例项目
  • 示例项目
  • 项目名称
  • 我的项目名称
  • 第一个项目名称
  • 我的项目名称

但它什么也没返回。

我的查询:

POST http://localhost:9200/docstore/textbox/_search
{
    "query": {
        "match": {
            "text": "proj"
        }
    },
    "filter": {
        "has_parent": {
            "type": "document",
            "query": {
                "term": {
                    "name": "1-a1-1001.pdf"
                }
            }
        }
    }
}

如果我通过 project 搜索,我会得到:

{ "took": 4,
    "timed_out": false,
    "_shards": {
        "total": 5,
        "successful": 5,
        "failed": 0
    },
    "hits": {
        "total": 2,
        "max_score": 3.0133555,
        "hits": [
            {
                "_index": "docstore",
                "_type": "textbox",
                "_id": "AVRuV2d_f4y6IKuxK35g",
                "_score": 3.0133555,
                "_routing": "AVRuVvtLf4y6IKuxK33f",
                "_parent": "AVRuV2cMf4y6IKuxK33g",
                "_source": {
                    "bbox": [
                        8750,
                        5362,
                        9291,
                        5445
                    ],
                    "text": [
                        "Sample Project"
                    ]
                }
            },
            {
                "_index": "docstore",
                "_type": "textbox",
                "_id": "AVRuV2d_f4y6IKuxK35Y",
                "_score": 2.4106843,
                "_routing": "AVRuVvtLf4y6IKuxK33f",
                "_parent": "AVRuV2cMf4y6IKuxK33g",
                "_source": {
                    "bbox": [
                        8645,
                        5170,
                        9070,
                        5220
                    ],
                    "text": [
                        "Project Name and Address"
                    ]
                }
            }
        ]
    }
}

也许我的 edgengram 不适合这个? 我是说:

side":"front"

我应该采取不同的做法吗?

有谁知道我做错了什么?

【问题讨论】:

    标签: elasticsearch autocomplete


    【解决方案1】:

    问题在于自动完成索引分析器字段名称。

    变化:

    "index_analyzer": "edgeNgram_autocomplete"
    

    收件人:

    "analyzer": "edgeNgram_autocomplete"
    

    并且像他的回答中显示的(@Andrei Stefan)一样搜索:

    POST http://localhost:9200/docstore/textbox/_search
    {
        "query": {
            "match": {
                "text.autocomplete": "proj"
            }
        }
    }
    

    它会按预期工作!

    我已经在Elasticsearch 2.3上测试了你的配置

    顺便说一下,输入multi_field就是deprecated

    希望我能提供帮助:)

    【讨论】:

    • 是的,替换为string。例如,在您的示例中,文本字段如下所示:"text": { "type": "string", "analyzer": "full_name", "fields": { "autocomplete": { "type": "string", "analyzer": "edgeNgram_autocomplete", "search_analyzer": "full_name" } } }
    【解决方案2】:

    您的查询实际上应该尝试匹配 text.autocomplete 而不是 text

      "query": {
        "match": {
          "text.autocomplete": "proj"
        }
      }
    

    【讨论】:

    • .... 好的,运行以下提供结果:curl -XGET "http://localhost:9200/docstore/textbox/_search" -d' { "query": { "match": { "text": "project" } }, "filter": { "has_parent": { "type": "document", "query": { "term": { "name": "1-a1-1001.pdf" } } } }, "fielddata_fields": ["text.autocomplete","text"] }'
    • 它什么也没做
    猜你喜欢
    • 2020-12-27
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多