【问题标题】:elasticsearch - multiple index search returning unexpected resultselasticsearch - 多索引搜索返回意外结果
【发布时间】:2015-10-13 10:30:09
【问题描述】:

我需要一些帮助来解读我的 ES 设置发生了什么。基本上,我使用自定义分析器(我们支持的每种语言一个)并在索引时为我们拥有的每个客户创建了多个索引。问题出现在搜索时,当我对所有客户的索引进行搜索时,一个特定的索引(英语)总是比其他语言排名更高,即使搜索的术语在该英语索引文档中出现的次数较少。

这就是我的 ES 设置中的内容: 我们有多个客户,每个客户都可以上传多种语言的文件。因此,为了满足这一要求,我设置了根据 clientId 和语言命名的索引,即 A-en、A-de、A-fr、B-en、B-it 等(其中 A 和 B 是客户端 ID, -xx 是 ISO 语言代码)。每个索引都是使用该客户所需语言的自定义分析器创建的,并且每个字段都映射为在设置部分中使用这些自定义分析器,如下所示: 这是一个英语索引设置,所有具有“英语”文档的客户端都将被索引:

{
    "settings" : {
        "index" : {
            "number_of_shards" : 5,
            "number_of_replicas" : 1
        },
        "analysis" : {
            "filter" : {
                "english_keywords" : {
                    "type" : "keyword_marker",
                    "keywords" :  ["_none_"]
                },
                "english_stop" : {
                    "type" : "stop",
                    "stopwords" :  ["_none_"]
                },
                "synonym_filter" : {
                    "type" : "synonym",
                    "expand" : 1,
                    "synonyms" :  ["_none_"]
                },
                "english_stemmer" : {
                    "type" : "stemmer",
                    "language" : "english"
                }
            },
            "analyzer" : {
                "lens-english" : {
                    "type" : "custom",
                    "tokenizer" : "standard",
                    "filter" : ["english_keywords", "lowercase", "english_stop", "english_stemmer", "synonym_filter"]
                }
            }
        }
    },
    "mappings" : {
    "video" : {
        "properties" : {
            "Attributes" : {
                "type" : "string",
                "index" : "not_analyzed"
            },
            "ClientId" : {
                "type" : "string",
                "index" : "not_analyzed"
            },
            "Comments" : {
                "type" : "string",
                "analyzer" : "lens-english"
            },
            "Continent" : {
                "type" : "string",
                "index" : "not_analyzed"
            },
            "CountryOfOrigin" : {
                "type" : "string",
                "index" : "not_analyzed"
            },
            "CreatedDate" : {
                "type" : "date",
                "format" : "dateOptionalTime"
            },
            "Description" : {
                "type" : "string",
                "analyzer" : "lens-english"
            },
            "DescriptionEnglish" : {
                "type" : "string",
                "analyzer" : "english"
            },
            "DislikesCount" : {
                "type" : "double"
            },
            "EnglishTranscription" : {
                "type" : "string",
                "analyzer" : "english"
            },
            "Favourite" : {
                "type" : "string",
                "index" : "not_analyzed"
            },
            "FromProject" : {
                "type" : "boolean"
            },
            "IsSearchable" : {
                "type" : "boolean"
            },
            "LanguageISOCode" : {
                "type" : "string",
                "index" : "not_analyzed"
            },
            "LanguageOfOrigin" : {
                "type" : "string",
                "index" : "not_analyzed"
            },
            "LikesCount" : {
                "type" : "double"
            },
            "NativeTranscription" : {
                "type" : "string",
                "analyzer" : "lens-english"
            },
            "ObjectId" : {
                "type" : "string",
                "index" : "not_analyzed"
            },
            "Published" : {
                "type" : "date",
                "format" : "dateOptionalTime"
            },
            "Recommendations" : {
                "type" : "string",
                "index" : "not_analyzed"
            },
            "Status" : {
                "type" : "long"
            },
            "Tags" : {
                "type" : "string",
                "analyzer" : "lens-english"
            },
            "Title" : {
                "type" : "string",
                "analyzer" : "lens-english"
            },
            "TitleEnglish" : {
                "type" : "string",
                "analyzer" : "english"
            },
            "TranscriptionStatus" : {
                "type" : "double"
            },
            "UploadSource" : {
                "type" : "double"
            },
            "VideoImage" : {
                "type" : "string",
                "index" : "no"
            },
            "ViewCount" : {
                "type" : "double"
            },
            "WatchLater" : {
                "type" : "string",
                "index" : "not_analyzed"
            },
            "ExternalMetadata" : {
                "type" : "nested",
                "properties" : {
                    "Filters" : {
                        "type" : "string",
                        "index" : "not_analyzed"
                    },
                    "ProjectId" : {
                        "type" : "string",
                        "index" : "not_analyzed"
                    },
                    "Roles" : {
                        "type" : "string",
                        "index" : "not_analyzed"
                    }
                }
            }
        }
    }
}

}

这里是土耳其语索引,适用于那些有土耳其语文档需要索引的客户......

{
    "settings" : {
        "index" : {
            "number_of_shards" : 5,
            "number_of_replicas" : 1
        },
        "analysis" : {
            "filter" : {
                "turkish_stop" : {
                    "type" : "stop",
                    "stopwords" : "_turkish_"
                },
                "synonym_filter" : {
                    "type" : "synonym",
                    "synonyms" :  ["_none_"]
                },
                "turkish_lowercase" : {
                    "type" : "lowercase",
                    "language" : "turkish"
                },
                "turkish_keywords" : {
                    "type" : "keyword_marker",
                    "keywords" :  ["_none_"]
                },
                "turkish_stemmer" : {
                    "type" : "stemmer",
                    "language" : "turkish"
                }
            },
            "analyzer" : {
                "lens-turkish" : {
                    "tokenizer" : "standard",
                    "filter" : ["apostrophe", "turkish_lowercase", "turkish_stop", "turkish_keywords", "turkish_stemmer", "synonym_filter"]
                },
                "folding" : {
                    "filter" : ["lowercase", "asciifolding"],
                    "tokenizer" : "standard"
                }
            }
        }
    },
    "mappings" : {
    "video" : {
        "properties" : {
            "Attributes" : {
                "type" : "string",
                "index" : "not_analyzed"
            },
            "ClientId" : {
                "type" : "string",
                "index" : "not_analyzed"
            },
            "Comments" : {
                "type" : "string",
                "analyzer" : "lens-turkish"
            },
            "Continent" : {
                "type" : "string",
                "index" : "not_analyzed"
            },
            "CountryOfOrigin" : {
                "type" : "string",
                "index" : "not_analyzed"
            },
            "CreatedDate" : {
                "type" : "date",
                "format" : "dateOptionalTime"
            },
            "Description" : {
                "type" : "string",
                "analyzer" : "lens-turkish"
            },
            "DescriptionEnglish" : {
                "type" : "string",
                "analyzer" : "english"
            },
            "DislikesCount" : {
                "type" : "double"
            },
            "EnglishTranscription" : {
                "type" : "string",
                "analyzer" : "english"
            },
            "Favourite" : {
                "type" : "string",
                "index" : "not_analyzed"
            },
            "FromProject" : {
                "type" : "boolean"
            },
            "IsSearchable" : {
                "type" : "boolean"
            },
            "LanguageISOCode" : {
                "type" : "string",
                "index" : "not_analyzed"
            },
            "LanguageOfOrigin" : {
                "type" : "string",
                "index" : "not_analyzed"
            },
            "LikesCount" : {
                "type" : "double"
            },
            "NativeTranscription" : {
                "type" : "string",
                "analyzer" : "lens-turkish"
            },
            "ObjectId" : {
                "type" : "string",
                "index" : "not_analyzed"
            },
            "Published" : {
                "type" : "date",
                "format" : "dateOptionalTime"
            },
            "Recommendations" : {
                "type" : "string",
                "index" : "not_analyzed"
            },
            "Status" : {
                "type" : "long"
            },
            "Tags" : {
                "type" : "string",
                "analyzer" : "lens-turkish"
            },
            "Title" : {
                "type" : "string",
                "analyzer" : "lens-turkish"
            },
            "TitleEnglish" : {
                "type" : "string",
                "analyzer" : "english"
            },
            "TranscriptionStatus" : {
                "type" : "double"
            },
            "UploadSource" : {
                "type" : "double"
            },
            "VideoImage" : {
                "type" : "string",
                "index" : "no"
            },
            "ViewCount" : {
                "type" : "double"
            },
            "WatchLater" : {
                "type" : "string",
                "index" : "not_analyzed"
            },
            "ExternalMetadata" : {
                "type" : "nested",
                "properties" : {
                    "Filters" : {
                        "type" : "string",
                        "index" : "not_analyzed"
                    },
                    "ProjectId" : {
                        "type" : "string",
                        "index" : "not_analyzed"
                    },
                    "Roles" : {
                        "type" : "string",
                        "index" : "not_analyzed"
                    }
                }
            }
        }
    }
}

}

所有语言索引都遵循这种模式(我们支持 24 种不同的语言),每个客户端在创建索引以及将文档编入这些索引时都将使用其中一种设置。

所以,这一切看起来都很好,ES 对此很满意。现在来到搜索查询,这就是事情变得混乱的地方。

我的搜索查询基于“短语必须优先于单个字词”的要求。此外,当客户端执行搜索时,必须在其所有文档和语言中执行该搜索(因此创建索引时使用名称中的客户端 ID)。这是通过在索引名称的 url 中使用通配符来实现的,即 /A-*/video/_search 将搜索所有客户端 A 文档,而不考虑语言。

这是我发布到服务器的搜索查询...

POST /5617c3c867567a0b0c570a95-*/video/_search
{
 "from": "0",
 "size": "1000",
 "query": {
   "template": {
     "query": {
       "filtered": {
         "query": {
           "bool": {
             "must": [
               {
                 "multi_match": {
                   "query": "{{query_string}}",
                   "type": "most_fields",
                   "fields": [
                     "Title^3",
                     "Description^2",
                     "TitleEnglish",
                     "DescriptionEnglish",
                     "EnglishTranscription",
                     "NativeTranscription",
                     "Tags",
                     "Comments"
                   ],
                   "tie_breaker": 0.1, 
                   "minimum_should_match": "70%"
                 }
               }
             ]
           }
         },
         "filter": {
           "bool": {
             "must": [
               {
                 "term": {
                   "IsSearchable": true
                 }
               },
               {
                 "term": {
                   "Private": false
                 }
               }
             ]
           }
         }
       }
     },
     "params": {
       "query_string": "Turkish"
     }
   }
 }
}

请注意,我正在搜索“土耳其语”一词,并搜索所有语言。现在查看结果,并注意 *-en 索引返回的排名高于 *-tr(土耳其语)索引,后者在整个文档字段中包含“土耳其语”一词的次数更多。

{
   "took": 5,
   "timed_out": false,
   "_shards": {
      "total": 15,
      "successful": 15,
      "failed": 0
   },
   "hits": {
      "total": 7,
      "max_score": 0.21282451,
      "hits": [
         {
            "_index": "5617c3c867567a0b0c570a95-en",
            "_type": "video",
            "_id": "561bd2b274cbe0123c099ace",
            "_score": 0.21282451,
            "_source": {
               "CountryOfOrigin": "United Kingdom",
               "Continent": "Europe",
               "LanguageOfOrigin": "English",
               "LanguageIsoCode": "en",
               "Title": "Nikes",
               "TitleEnglish": "Eng video Eng lang",
               "Description": "izlemek",
               "DescriptionEnglish": "",
               "VideoImage": "ff3a093a-700e-4c53-94df-cc5eb425c043_Image.jpg",
               "ViewCount": 9,
               "LikesCount": 0,
               "DislikesCount": 0,
               "CreatedDate": "2015-10-12T15:33:05.634Z",
               "WatchLater": [],
               "Favourite": [],
               "Status": 2,
               "TranscriptionStatus": 6,
               "UploadSource": 3,
               "IsSearchable": true,
               "FromProject": false,
               "NativeTranscription": "",
               "Tags": [
                  "Turkish",
                  "Nike"
               ],
               "Comments": [],
               "Attributes": [],
               "Recommendations": [],
               "ClientId": "5617c3c867567a0b0c570a95",
               "Private": false,
               "ObjectId": "561bd2b274cbe0123c099ace"
            }
         },
         {
            "_index": "5617c3c867567a0b0c570a95-en",
            "_type": "video",
            "_id": "5617cb8b74cbe2110890820b",
            "_score": 0.19917427,
            "_source": {
               "CountryOfOrigin": "Armenia",
               "Continent": "Europe",
               "LanguageOfOrigin": "English",
               "LanguageIsoCode": "en",
               "Title": "English Video",
               "TitleEnglish": "English Video",
               "DescriptionEnglish": "",
               "VideoImage": "df80412b-d6b9-4104-932b-c8e44b005fb2_Image.jpg",
               "ViewCount": 16,
               "LikesCount": 1,
               "DislikesCount": 0,
               "CreatedDate": "2015-10-09T14:13:30.893Z",
               "WatchLater": [],
               "Favourite": [],
               "Status": 2,
               "TranscriptionStatus": 5,
               "UploadSource": 3,
               "IsSearchable": true,
               "FromProject": false,
               "NativeTranscription": "",
               "Tags": [
                  "Turkish",
                  "Purple Aki"
               ],
               "Comments": [],
               "Attributes": [],
               "Recommendations": [],
               "ClientId": "5617c3c867567a0b0c570a95",
               "Private": false,
               "ObjectId": "5617cb8b74cbe2110890820b"
            }
         },
         {
            "_index": "5617c3c867567a0b0c570a95-en",
            "_type": "video",
            "_id": "561bb49e74cbe002f09301fa",
            "_score": 0.17025961,
            "_source": {
               "CountryOfOrigin": "United Kingdom",
               "Continent": "Europe",
               "LanguageOfOrigin": "English",
               "LanguageIsoCode": "en",
               "Title": "Mark's Transcription Test",
               "TitleEnglish": "Mark's Transcription Test",
               "DescriptionEnglish": "",
               "VideoImage": "09c6d366-6807-4d9d-9588-fd4730907b9b_Image.jpg",
               "ViewCount": 6,
               "LikesCount": 0,
               "DislikesCount": 0,
               "CreatedDate": "2015-10-12T13:24:45.833Z",
               "WatchLater": [],
               "Favourite": [],
               "Status": 2,
               "TranscriptionStatus": 6,
               "UploadSource": 3,
               "IsSearchable": true,
               "FromProject": false,
               "NativeTranscription": "",
               "Tags": [
                  "turkish",
                  "mark",
                  "Watch"
               ],
               "Comments": [],
               "Attributes": [],
               "Recommendations": [],
               "ClientId": "5617c3c867567a0b0c570a95",
               "Private": false,
               "ObjectId": "561bb49e74cbe002f09301fa"
            }
         },
         {
            "_index": "5617c3c867567a0b0c570a95-tr",
            "_type": "video",
            "_id": "5617c97c74cbe21108908205",
            "_score": 0.12725623,
            "_source": {
               "CountryOfOrigin": "Turkey",
               "Continent": "Asia",
               "LanguageOfOrigin": "Turkish",
               "LanguageIsoCode": "tr",
               "Title": "Turkish Video - Under 10mins - Request Trans",
               "TitleEnglish": "Turkish Video - Under 10mins - Request Trans",
               "Description": "Turkish  - Request Trans",
               "DescriptionEnglish": "Turkish  - Request Trans",
               "VideoImage": "ba4341e5-7af8-418e-91e3-818e290a0989_Image.jpg",
               "ViewCount": 21,
               "LikesCount": 0,
               "DislikesCount": 0,
               "CreatedDate": "2015-10-09T14:04:44.033Z",
               "WatchLater": [],
               "Favourite": [],
               "Status": 2,
               "TranscriptionStatus": 5,
               "UploadSource": 3,
               "IsSearchable": true,
               "FromProject": false,
               "NativeTranscription": "",
               "Tags": [],
               "Comments": [
                  "Turkish",
                  "Liverpool"
               ],
               "Attributes": [
                  "5617c80974cbe211089081fd_3_2",
                  "5617c80974cbe211089081fe_4_1"
               ],
               "Recommendations": [],
               "ClientId": "5617c3c867567a0b0c570a95",
               "Private": false,
               "ObjectId": "5617c97c74cbe21108908205"
            }
         },
         {
            "_index": "5617c3c867567a0b0c570a95-tr",
            "_type": "video",
            "_id": "5617ca3574cbe21108908208",
            "_score": 0.07719648,
            "_source": {
               "CountryOfOrigin": "Argentina",
               "Continent": "South America",
               "LanguageOfOrigin": "Turkish",
               "LanguageIsoCode": "tr",
               "Title": "Turkish Video - No Trans",
               "TitleEnglish": "Turkish Video - No Trans",
               "DescriptionEnglish": "",
               "VideoImage": "735f0c09-3c1c-415e-870f-70f18be632ea_Image.jpg",
               "ViewCount": 14,
               "LikesCount": 0,
               "DislikesCount": 0,
               "CreatedDate": "2015-10-09T14:07:49.705Z",
               "WatchLater": [],
               "Favourite": [],
               "Status": 2,
               "TranscriptionStatus": 0,
               "UploadSource": 3,
               "IsSearchable": true,
               "FromProject": false,
               "NativeTranscription": "",
               "Tags": [
                  "Turkish"
               ],
               "Comments": [],
               "Attributes": [],
               "Recommendations": [],
               "ClientId": "5617c3c867567a0b0c570a95",
               "Private": false,
               "ObjectId": "5617ca3574cbe21108908208"
            }
         },
         {
            "_index": "5617c3c867567a0b0c570a95-de",
            "_type": "video",
            "_id": "5617c8ca74cbe211089081ff",
            "_score": 0.015614418,
            "_source": {
               "CountryOfOrigin": "Germany",
               "Continent": "Europe",
               "LanguageOfOrigin": "German",
               "LanguageIsoCode": "de",
               "Title": "German Video - Under 10mins - With SRT",
               "TitleEnglish": "German Video - Under 10mins - With SRT",
               "Description": "German Video\nTag: Oct 9",
               "DescriptionEnglish": "German Video\nTag: Oct 9",
               "VideoImage": "04bf4827-3459-41f6-9fc0-7003dfe7ea5d_Image.jpg",
               "ViewCount": 5,
               "LikesCount": 0,
               "DislikesCount": 0,
               "Published": "2015-10-09T14:03:01.066Z",
               "CreatedDate": "2015-10-09T14:01:46.517Z",
               "WatchLater": [],
               "Favourite": [],
               "Status": 2,
               "TranscriptionStatus": 5,
               "UploadSource": 3,
               "IsSearchable": true,
               "FromProject": false,
               "NativeTranscription": "Ich denke, dass Nachhaltigkeit sich darum dreht,Verpackungen zu reduzieren oder Energie, die bei der Produktion entsteht,zu verringern oder auch lokal zu produzieren,um die CO2-Bilanz zu reduzieren.Ich glaube, dass sich viele Verbraucherbeim Einkaufen über Nachhaltigkeit Gedanken machen,was letztendlich auch beeinflusst was sie kaufen,vor allem aber würde ich von mir als Verbraucherin behaupten,dass ich mich an die Firmen halte, die die gleichen Wertebezüglich Nachhaltigkeit haben wie ich.Ich gehe gezielt in Geschäfte, die weniger Verpackung benutzenoder solche, die man einfacher recyclen kannund wenn wir können, gehen wir immer zu Fuß zu regionalenoder lokalen Geschäften, wenn sie in der Nähe sind.Und viele Unternehmen versuchen die gleichen Produktefür einen niedrigeren Preis zu verkaufen,aber wenn eine Firma mich überzeugen kann, dass ihre Produkte nachhaltiger sindoder sicherer für mich und meine Umwelt,wäre ich am Ende auch bereit, mehr zu bezahlen.Wenn ein Unternehmen behauptet, nachhaltig zu sein,will ich immer herausfinden auf welche Art und Weisesie sicherer sind.Es gibt so viele Öko-Zertifikateund ich weiß nicht was die bedeutenoder ob sie wirklich für Nachhaltigkeit stehen.Vielleicht könnte es einen Beschluss geben,der es den Verbrauchern einfacher macht,nachhaltige Produkte zu verstehen, das wäre für alle eine große Hilfe.",
               "EnglishTranscription": "I think that sustainability turns about, Packaging to reduce or energy generated in the production, to reduce or even locally to produce, to reduce the CO2 footprint. I think that to many consumers worry buy about sustainability, What ultimately affects what you buy but above all, I would argue by me as a consumer, that I the companies consider myself, the same values as I have with regard to sustainability. I'm specifically going to shops that use less packaging or such which is easier to recycle can and if we can, we go to regional always walking or local shops if they are nearby. And many companies are trying the same products for sale, for a lower price But if a company can convince me that their products are more sustainable or safe for me and my environment. would I also be willing to pay more at the end. If a company claims to be sustainable. will I always find out in what way they are safer. There are so many eco-certificates and I don't know what you mean or whether they really are for sustainability. Perhaps there could be a decision, Consumers easier makes it,. understanding sustainable products that would be a great help for everyone.",
               "Tags": [
                  "Oct 9",
                  "Turkish"
               ],
               "Comments": [],
               "Attributes": [
                  "5617c80974cbe211089081fd_3_2",
                  "5617c80974cbe211089081fe_4_4"
               ],
               "Recommendations": [],
               "ClientId": "5617c3c867567a0b0c570a95",
               "Private": false,
               "ObjectId": "5617c8ca74cbe211089081ff"
            }
         },
         {
            "_index": "5617c3c867567a0b0c570a95-tr",
            "_type": "video",
            "_id": "561b860d74cbe0103cf23369",
            "_score": 0.011710813,
            "_source": {
               "CountryOfOrigin": "Turkey",
               "Continent": "Asia",
               "LanguageOfOrigin": "Turkish",
               "LanguageIsoCode": "tr",
               "Title": "izlemek Nike",
               "TitleEnglish": "Demo 4",
               "Description": "izlemek Nike",
               "DescriptionEnglish": "Demo 4",
               "VideoImage": "97e66fe2-6f62-4a43-b234-0abda414dedf_Image.jpg",
               "ViewCount": 17,
               "LikesCount": 0,
               "DislikesCount": 0,
               "Published": "2015-10-12T10:07:52.281Z",
               "CreatedDate": "2015-10-12T10:06:05.015Z",
               "WatchLater": [],
               "Favourite": [],
               "Status": 2,
               "TranscriptionStatus": 5,
               "UploadSource": 3,
               "IsSearchable": true,
               "FromProject": false,
               "NativeTranscription": "Şimdi makyaj masamın başına geçtimVe makyajımı yapmaya başlayacağımÖncelikle güzel bir baz süreceğimSmashbox'ın Photo Finish bazını kullanacağımÖnce göz makyajımı yapacağımBugün böyle altın ve siyah tonlarındaya da altın kahve tonlarında bir makyaj yapmayı planlıyorumÇünkü, giyeceğim bir ceket varCeket de altın zincirler ve altın detaylar taşıyorEe tabii, söz konusu altın olduğu zamanAltın ve bronz ve doğal tonlar olduğu zamanNaked paletimden elimi çekemiyorumEe tabii far kullanacaksam, bir far bazı kullanmadan olmazUrban Decay far kullanacağım içintesadüfen Urban Decay'den primer potion göz bazını kullanacağımŞu kadar miktar benim için yeterliBeni biraz böyle nefes nefese vehani koşturur vaziyette görebilirsinizÇünkü birazcık acelem varVe hazır böyle güzel bir saç makyaj gibi bir şey planlıyorlenNeden videosunu çekmeyeyim, diye düşündüm",
               "EnglishTranscription": "Now I take over my dressing table And I'm going to start doing my makeup First of all, I'm going to drive a beautiful base Smashbox's Photo Finish base to use First, I'm going to do my eye makeup Today in shades of gold and black or I'm planning to do a makeup in shades of gold and coffee I'm going to wear a coat, because there Jacket in gold chains and gold carries the details So of course, when it comes to gold When gold and bronze and natural hues I can't get my hand off my naked palette So of course I use a headlight headlights not without some Urban Decay eyeshadow I use for Incidentally, I'm going to use from the Urban Decay primer potion eye base This quantity is enough for me That's me a little breathless and you know, the one you can see running condition Because it's a little bit of a hurry And such a beautiful something like hair make-up ready planned yorlen Why is the video I thought, that I may not",
               "Tags": [
                  "test tag",
                  "turkish",
                  "mark",
                  "izlemek",
                  "Purple Aki"
               ],
               "Comments": [],
               "Attributes": [
                  "5617c80974cbe211089081fd_3_1"
               ],
               "Recommendations": [],
               "ClientId": "5617c3c867567a0b0c570a95",
               "Private": false,
               "ObjectId": "561b860d74cbe0103cf23369"
            }
         }
      ]
   }
}

任何知道要寻找什么的人都可以对此有一个眼球,看看我在这里是否遗漏了什么?

【问题讨论】:

    标签: elasticsearch sense


    【解决方案1】:

    词频只是计算相关性的一部分——逆文档频率和文档长度也很重要。在您的示例中,英文文档排名较高,因为 1)它们更短,以及 2)英语索引包含较少提及术语“土耳其语”,使得每个确实具有该术语的文档排名更高。

    【讨论】:

    • 有什么方法可以证明是这样吗?
    • 是的 - 尝试将非常相似的文档添加到英语和土耳其语索引中,唯一的区别是它们使用“英语”或“土耳其语”等关键字的频率。这样,“土耳其”文件的排名应该更高。您可以尝试使用 Elasticsearch 解释 API 来解决排名问题:elastic.co/guide/en/elasticsearch/reference/current/…
    • 我确实尝试了您的建议,但似乎没有任何区别。但是,我发现如果我删除 '"tie_breaker":0.1' 参数,那么它似乎可以工作!删除时“解释”返回的计算似乎发生了很大变化。任何想法为什么?我有点需要 tie_breaker 来进行短语搜索,但如果它是罪魁祸首,可以将其删除。
    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 2019-08-20
    • 2017-11-14
    • 1970-01-01
    • 2021-10-31
    • 1970-01-01
    • 1970-01-01
    • 2020-08-30
    相关资源
    最近更新 更多