【发布时间】:2019-09-12 11:16:55
【问题描述】:
我使用 ElasticSearch 中的 synonym_graph 功能,它似乎工作正常。
我试图通过使用直接测试分析器来直观地理解新 synonym_graph 的工作原理和拆分单词的方式
GET my_index/_analyze
{
"text": "I really love eating lots and lots of fried cheese",
"analyzer": "my_analyzer"
}
我想知道分析器的输出是什么意思。
在这个例子中,术语“fried cheese”有几个定义的同义词,其中一些是多词,一些是单个词
fried cheese => fried cheese, mozzarellasticks, Queso Frito, cheesecurd, friedmozzarella
分析器的输出是
{
"tokens" : [
{
"token" : "i",
"start_offset" : 0,
"end_offset" : 1,
"type" : "<ALPHANUM>",
"position" : 0
},
{
"token" : "realli",
"start_offset" : 2,
"end_offset" : 8,
"type" : "<ALPHANUM>",
"position" : 1
},
{
"token" : "love",
"start_offset" : 9,
"end_offset" : 13,
"type" : "<ALPHANUM>",
"position" : 2
},
{
"token" : "eat",
"start_offset" : 14,
"end_offset" : 20,
"type" : "<ALPHANUM>",
"position" : 3
},
{
"token" : "lot",
"start_offset" : 21,
"end_offset" : 25,
"type" : "<ALPHANUM>",
"position" : 4
},
{
"token" : "lot",
"start_offset" : 30,
"end_offset" : 34,
"type" : "<ALPHANUM>",
"position" : 6
},
{
"token" : "friedchees",
"start_offset" : 38,
"end_offset" : 50,
"type" : "SYNONYM",
"position" : 7,
"positionLength" : 4
},
{
"token" : "fri",
"start_offset" : 38,
"end_offset" : 50,
"type" : "SYNONYM",
"position" : 7
},
{
"token" : "mozzarellastick",
"start_offset" : 38,
"end_offset" : 50,
"type" : "SYNONYM",
"position" : 7,
"positionLength" : 4
},
{
"token" : "queso",
"start_offset" : 38,
"end_offset" : 50,
"type" : "SYNONYM",
"position" : 7,
"positionLength" : 2
},
{
"token" : "cheesecurd",
"start_offset" : 38,
"end_offset" : 50,
"type" : "SYNONYM",
"position" : 7,
"positionLength" : 4
},
{
"token" : "friedmozzarella",
"start_offset" : 38,
"end_offset" : 50,
"type" : "SYNONYM",
"position" : 7,
"positionLength" : 4
},
{
"token" : "fri",
"start_offset" : 38,
"end_offset" : 43,
"type" : "<ALPHANUM>",
"position" : 7,
"positionLength" : 3
},
{
"token" : "chees",
"start_offset" : 38,
"end_offset" : 50,
"type" : "SYNONYM",
"position" : 8,
"positionLength" : 3
},
{
"token" : "frito",
"start_offset" : 38,
"end_offset" : 50,
"type" : "SYNONYM",
"position" : 9,
"positionLength" : 2
},
{
"token" : "chees",
"start_offset" : 44,
"end_offset" : 50,
"type" : "<ALPHANUM>",
"position" : 10
}
]
}
我正在尝试了解此结果中同义词标记的参数。 让我们以同义词“Queso Frito”为例
{
"token" : "frito",
"start_offset" : 38,
"end_offset" : 50,
"type" : "SYNONYM",
"position" : 9,
"positionLength" : 2
}
{
"token" : "queso",
"start_offset" : 38,
"end_offset" : 50,
"type" : "SYNONYM",
"position" : 7,
"positionLength" : 2
}
所有附加参数的含义是什么? “start_offset”、“end_offset”、“position”、“positionLength”
【问题讨论】:
标签: elasticsearch analyzer synonym