【问题标题】:Is it possible with Python to reconstruct a jumbled sentence to match a full sentence?是否可以用 Python 重构一个杂乱的句子来匹配一个完整的句子?
【发布时间】:2021-04-13 07:47:30
【问题描述】:

我有一个 CSV 的句子和另一个 CSV,其中相同的句子被打断和混乱。

例如,一个 CSV 具有:

The quick brown fox jumps over the lazy dog.

另一个 CSV 有:

jumps over the
The quick brown fox
lazy dog.

每个 CSV 包含超过 1 个句子,但希望您能从上面的示例中获得灵感。

我使用模糊匹配来查看它们是否匹配,但现在我想重构句子。
是否有可能使用 Python 重构混乱的 CSV 以匹配完整的句子?

【问题讨论】:

  • 您可以简单地检查句子的每个部分是否出现在完整的句子中
  • 您的意思是要重新排列混乱的 CSV 中的行,以便 sn-ps 以正确的顺序显示?
  • @EliasStrehle 是的,就是这样!唯一的问题是会有超过 1 个句子要匹配,并且多个 sn-ps 会在同一个 csv 中混杂。
  • 'The quick brown fox jumps over the lazy dog.'.find('jumps over the') 为您提供子字符串的索引位置。对每个子字符串执行此操作并按索引排序。 (如果子字符串在混乱的 CSV 中不明确或重复,可能无法按预期工作)。

标签: python csv fuzzywuzzy


【解决方案1】:

伟大而具有挑战性的问题!

我尝试了一些东西,并在下面的代码中的 cmets 中解释了相同的内容:

#Original Sentences
clean_sentences = [
    "The quick brown fox jumps over the lazy dog.",
    "A wizard's job is to vex chumps quickly in fog."
]

#CSV in the form of a list
jumbled_sentences = [
    "is to vex chumps ",
    "jumps over the ",
    "The quick brown fox ",
    "quickly in fog.",
    "lazy dog.",
    "A wizard's job ",
]

# from fuzzywuzzy import fuzz, process
from rapidfuzz import fuzz, process # use this for faster results when a lot of fuzzywuzzy operations are to be done

for clean_sentence in clean_sentences:

    ordered_sentences = []

    #we find only those jumbled sentences who are 100% present(thats why partial ratio) in the original sentence
    fuzzResults = process.extract(clean_sentence, jumbled_sentences, scorer=fuzz.partial_ratio, score_cutoff=100)

    sentences_found = [fuzzResult[0] for fuzzResult in fuzzResults] #retrieve only sentence from fuzzy result

    index_sent_dict = {}
    for sentence_found in sentences_found:
        
        #we find index of each jumbled index and store it as dixtionary of {index:sentence}
        index_sent_dict.update({clean_sentence.index(sentence_found): sentence_found})
    
    #and then we sort the dictionary based on index and join the keys of the sorted dictionary

    sorted_dict = dict(sorted(index_sent_dict.items()))
    
    final_sentence = "".join(list(sorted_dict.values()))
    print(final_sentence)

    # The quick brown fox jumps over the lazy dog.
    # A wizard's job is to vex chumps quickly in fog.

【讨论】:

    猜你喜欢
    • 2014-05-16
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2011-12-01
    • 1970-01-01
    • 2023-04-10
    • 1970-01-01
    • 2022-01-11
    相关资源
    最近更新 更多