【问题标题】:how to find similarity between two question even though the words are differentiate即使单词不同,如何找到两个问题之间的相似性
【发布时间】:2019-09-24 09:13:04
【问题描述】:

有什么方法可以找到字符串的含义是否相似,,即使字符串中的单词是有区别的

到目前为止,我尝试了模糊模糊、列文斯坦距离、余弦相似度来匹配字符串,但所有匹配的都是单词而不是单词的含义

Str1 = "what are types of negotiation"
Str2 = "what are advantages of negotiation"
Str3 = "what are categories of negotiation"
Ratio = fuzz.ratio(Str1.lower(),Str2.lower())
Partial_Ratio = fuzz.partial_ratio(Str1.lower(),Str2.lower())
Token_Sort_Ratio = fuzz.token_sort_ratio(Str1,Str2)
Ratio1 = fuzz.ratio(Str1.lower(),Str3.lower())
Partial_Ratio1 = fuzz.partial_ratio(Str1.lower(),Str3.lower())
Token_Sort_Ratio1 = fuzz.token_sort_ratio(Str1,Str3)
print("fuzzywuzzy")
print(Str1," ",Str2," ",Ratio)
print(Str1," ",Str2," ",Partial_Ratio)
print(Str1," ",Str2," ",Token_Sort_Ratio)
print(Str1," ",Str3," ",Ratio1)
print(Str1," ",Str3," ",Partial_Ratio1)
print(Str1," ",Str3," ",Token_Sort_Ratio1)
print("levenshtein ratio")
Ratio = levenshtein_ratio_and_distance(Str1,Str2,ratio_calc = True)
Ratio1 = levenshtein_ratio_and_distance(Str1,Str3,ratio_calc = True)
print(Str1," ",Str2," ",Ratio)
print(Str1," ",Str3," ",Ratio)

output:
fuzzywuzzy
what are types of negotiation   what are advantages of negotiation   86
what are types of negotiation   what are advantages of negotiation   76
what are types of negotiation   what are advantages of negotiation   73
what are types of negotiation   what are categories of negotiation   86
what are types of negotiation   what are categories of negotiation   76
what are types of negotiation   what are categories of negotiation   73
levenshtein ratio
what are types of negotiation   what are advantages of negotiation               
0.8571428571428571
what are types of negotiation   what are categories of negotiation       
0.8571428571428571



expected output:
"what are the types of negotiation skill?"
"what are the categories in negotiation skill?"
output:similar
"what are the types of negotiation skill?"
"what are the advantages of negotiation skill?"
output:not similar

【问题讨论】:

  • 简单:有什么办法可以找出两个字符串含义的相似度

标签: python nlp chatbot sentence-similarity


【解决方案1】:

您想对两个字符串的语义相似度进行评分。

Fuzzy-wuzzy 和 Levenshtein 距离仅对字符距离进行评分。

您需要记下语义信息。因此,您需要为您的字符串提供语义表示。

也许一个简单但有效的方法在于:

  1. 使用针对您的语言的预训练词嵌入计算代表两个字符串的两个向量(例如 FastText - get_sentence_vector https://fasttext.cc/docs/en/python-module.html#model-object
  2. 计算两个向量之间的余弦相似度(1:相等的字符串;0:真正不同的字符串)。

当然,还有更好更复杂的方法。 为了深入理解这个话题,推荐这篇文章(https://medium.com/@adriensieg/text-similarities-da019229c894),里面有丰富的解释和代码实现。

【讨论】:

猜你喜欢
  • 2021-10-23
  • 2014-12-26
  • 2014-11-19
  • 2016-09-01
  • 1970-01-01
  • 2015-04-26
  • 1970-01-01
  • 2021-02-22
  • 1970-01-01
相关资源
最近更新 更多