【问题标题】:Elasticsearch fuzzy search phrase with dash带有破折号的 Elasticsearch 模糊搜索短语
【发布时间】:2016-05-17 20:04:12
【问题描述】:

我正在尝试找到一种方法来索引具有“In-N-Out Burger”之类描述的文档,并进行“in-n out”或“in and out”之类的搜索或直接“in-n-” out”并让它返回“In-N-Out Burger”文档。浏览文档时,我对如何在索引或搜索时处理破折号感到困惑。有什么建议吗?

我当前的设置和映射:

curl -XPUT http://localhost:9200/objects -d '{
    "settings": {
        "analysis": {
            "analyzer": {
                "lower": {
                    "type": "custom",
                    "tokenizer": "keyword",
                    "filter": [ "lowercase" ] 
                }
            }
        }
    }
}'

curl -XPUT http://localhost:9200/objects/object/_mapping -d '{
    "object" : {
        "properties" : {
            "objectDescription" : {
                "type" : "string",
                "fields" : {
                    "lower": {
                        "type": "string",
                        "analyzer": "lower"
                    }
                }
            },
            "suggest" : {
                "type" : "completion",
                "analyzer" : "simple",
                "search_analyzer" : "simple",
                "payloads" : true
            }
        }
    }
}'

【问题讨论】:

  • 我的回答好运吗?
  • 非常抱歉!我现在不在国内,玩不了。我一到家就通知你:)

标签: elasticsearch fuzzy-search search-suggestion


【解决方案1】:

当我使用您的设置创建索引并放置文档时,我没有发现任何问题:

curl -XPUT http://localhost:9200/objects/object/001 -d '{
  "description": "In-N-Out Burger",
  "name" : "first_document"
}'

然后试图找到它:

curl -XGET 'localhost:9200/objects/object/_search?q=in+and+out&pretty'
{
  "took" : 6,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 1,
    "max_score" : 0.05038611,
    "hits" : [ {
      "_index" : "objects",
      "_type" : "object",
      "_id" : "001",
      "_score" : 0.05038611,
      "_source" : {
        "description" : "In-N-Out Burger",
        "name" : "first_document"
      }
    } ]
  }
}

curl -XGET 'localhost:9200/objects/object/_search?pretty&q=in-n-out'
{
  "took" : 8,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 1,
    "max_score" : 0.23252454,
    "hits" : [ {
      "_index" : "objects",
      "_type" : "object",
      "_id" : "001",
      "_score" : 0.23252454,
      "_source" : {
        "description" : "In-N-Out Burger",
        "name" : "first_document"
      }
    } ]
  }
}

如您所见,可以找到它。分析器使用“-”作为分隔符,并在您索引文档和尝试查找文档时在标记上划分短语。你可以看到这个作品:

curl -XGET 'localhost:9200/objects/_analyze?pretty=true' -d 'In-N-Out Burger'
{
  "tokens" : [ {
    "token" : "in",
    "start_offset" : 0,
    "end_offset" : 2,
    "type" : "<ALPHANUM>",
    "position" : 0
  }, {
    "token" : "n",
    "start_offset" : 3,
    "end_offset" : 4,
    "type" : "<ALPHANUM>",
    "position" : 1
  }, {
    "token" : "out",
    "start_offset" : 5,
    "end_offset" : 8,
    "type" : "<ALPHANUM>",
    "position" : 2
  }, {
    "token" : "burger",
    "start_offset" : 9,
    "end_offset" : 15,
    "type" : "<ALPHANUM>",
    "position" : 3
  } ]
}

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 2011-05-02
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2016-03-31
    • 1970-01-01
    • 2019-05-24
    相关资源
    最近更新 更多