【问题标题】:Elastic single word search issue弹性单字搜索问题
【发布时间】:2018-01-15 15:27:11
【问题描述】:

按照示例数据,我已在索引中插入...

"name":"Apple silicon cases for iPhone 6S Silver", "price":"100"
"name":"Apple silicon cases for iPhone 6S Gold", "price":"200"
"name":"Apple silicon cases for iPhone 6S Space Gray", "price":"300"
"name":"iPhone 8", "price":"70000"
"name":"iPhone 8 Plus", "price":"80000"
"name":"iPhone X", "price":"100000"
"name":"iPhone 8 Case, Black color", "price":"500"
"name":"iPhone battery charger", "price":"1000"

索引映射

{
  "shopfront": {
    "mappings": {
      "products": {
        "properties": {
          "name": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above": 256
              }
            }
          },
          "price": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above": 256
              }
            }
          }
        }
      }
    }
  }
}

当我运行以下查询时...

POST shopfront/products/_search
{
  "query": {
    "match": {
      "name": "iphone"
    }
  }
}

我得到的结果是

{
  "took": 2,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 8,
    "max_score": 0.2876821,
    "hits": [
      {
        "_index": "shopfront",
        "_type": "products",
        "_id": "3",
        "_score": 0.2876821,
        "_source": {
          "name": "Apple silicon cases for iPhone 6S Space Gray",
          "price": "300"
        }
      },
      {
        "_index": "shopfront",
        "_type": "products",
        "_id": "7",
        "_score": 0.19566216,
        "_source": {
          "name": "iPhone 8 Case, Black color",
          "price": "500"
        }
      },
      {
        "_index": "shopfront",
        "_type": "products",
        "_id": "5",
        "_score": 0.18232156,
        "_source": {
          "name": "iPhone 8 Plus",
          "price": "80000"
        }
      },
      {
        "_index": "shopfront",
        "_type": "products",
        "_id": "8",
        "_score": 0.18232156,
        "_source": {
          "name": "iPhone battery charger",
          "price": "1000"
        }
      },
      {
        "_index": "shopfront",
        "_type": "products",
        "_id": "1",
        "_score": 0.17068404,
        "_source": {
          "name": "Apple silicon cases for iPhone 6S Silver",
          "price": "100"
        }
      },
      {
        "_index": "shopfront",
        "_type": "products",
        "_id": "4",
        "_score": 0.16403349,
        "_source": {
          "name": "iPhone 8",
          "price": "70000"
        }
      },
      {
        "_index": "shopfront",
        "_type": "products",
        "_id": "6",
        "_score": 0.16403349,
        "_source": {
          "name": "iPhone X",
          "price": "100000"
        }
      },
      {
        "_index": "shopfront",
        "_type": "products",
        "_id": "2",
        "_score": 0.097333126,
        "_source": {
          "name": "Apple silicon cases for iPhone 6S Gold",
          "price": "200"
        }
      }
    ]
  }
}

我想要的是所有“iPhone”都应该在搜索列表的顶部,其余的应该在搜索列表的底部。

例如:

iPhone 8
iPhone 8 Plus
iPhone X
iPhone 8 Case, Black color
iPhone battery charger
Apple silicon cases for iPhone 6S Silver
Apple silicon cases for iPhone 6S Gold
Apple silicon cases for iPhone 6S Space Gray

"match_phrase_prefix" 也有帮助。任何想法,如何处理这种情况?

【问题讨论】:

  • 您可以遍历结果,检查第一个单词并将其与您的查询匹配并进行相应的排序。只是一种解决方法。
  • 是的,我可以作为最后的选择。但是,现在我正在从弹性的角度寻找一些解决方案。我真的觉得在 Elastic 本身中可能有任何简单的解决方案可以做到这一点,因为从搜索的角度来看,这是一种非常基本的要求。
  • POST shopfront/products/_search?explain=true 将帮助您了解评分。
  • 由于您的文档太少,请确保索引只有一个分片。或者使用参数运行您的查询:?search_type=dfs_query_then_fetch。你应该用这个以正确的顺序得到结果。 elastic.co/blog/…

标签: elasticsearch


【解决方案1】:

name.keyword 字段上使用带有正则表达式 should 子句的布尔查询来提高以 iPhone 开头的条目的分数。

https://www.elastic.co/guide/en/elasticsearch/reference/6.1/query-dsl-bool-query.html

请注意,所有 elasticsearch 正则表达式查询都是锚定的:

https://www.elastic.co/guide/en/elasticsearch/reference/6.1/query-dsl-regexp-query.html#_standard_operators

这样做iphone.* 将始终只匹配行的开头,并且不需要^ 匹配到前面。整个查询将如下所示:

{
    "query": {
        "bool" : {
            "should" : {
                "regexp" : {
                    "name.keyword" : "i[pP]hone [0-9X]( Plus)?"
                }
            },
            "must" : {
                 "match" : {
                     "name" : "iphone"
                }
            }
        }
    }
}

【讨论】:

  • 通过使用name.keyword,我将iPhone battery charger 排在第二位,仅次于iPhone 8 Plus。这不是我想要的顺序。我希望响应应该按照以下顺序iPhone 8,iPhone 8 Plus,iPhone X,iPhone 8 Case, Black color,iPhone battery charger,Apple silicon cases for iPhone 6S Silver,Apple silicon cases for iPhone 6S Gold,Apple silicon cases for iPhone 6S Space Gray
  • 嗯,我以为我们只是想从 iPhone 开始。不过这很好,只需将正则表达式调整为更具体。我认为说 iPhone,后跟一个数字或 X,然后是字符串的结尾或加号将起作用,因此已编辑上述内容以说明这一点。
  • 上述regex 解决方案仅适用于iphone。但是,我一直在寻找一些适用于所有情况的通用解决方案(来自不同品牌的不同类型的产品)。
  • 是什么让“电话”产品与众不同?您应该能够调整正则表达式以涵盖其他情况,其中包含一组正则表达式查询,或者如果有太多不同的情况具有“产品类型”:“电话”字段会更好
  • 好的。我想我明白了……你想说什么。虽然,这不是我问题的完整解决方案......它肯定会引导我朝着正确的方向前进。所以,我接受这个答案。感谢您的帮助。
猜你喜欢
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 2023-01-14
  • 2016-05-23
  • 2021-08-16
相关资源
最近更新 更多