【问题标题】:Python - How to find matching one or multiply words in a sentencePython - 如何在一个句子中找到匹配的一个或多个单词
【发布时间】:2019-04-21 06:58:16
【问题描述】:

所以我正在尝试为自己制作一个小脚本,其中我有一个或多个单词,并且应该在一个随机句子中找到所有匹配的单词。

等:

Sentence1 = "Hello, I am new here and I hope I will be able to help and get helped from Stackoverflow"

Sentence2 = "Is it beautiful weather"

Sentence3 = "I hope it wont be snowing here soon"

Sentence4 = "How is the weather"

Words = ['I+be', 'it+weather']

输出应该是这样的

Hello, I am new here and I hope I will be able to help and get helped from Stackoverflow

Is it beautiful weather

I hope it wont be snowing here soon

而且它不打印第一个和最后一个的原因是它不包含IBeit天气

所以我的问题基本上是如何使每个 + 或任何其他特殊字符,如关键字 1 + 关键字 2 + n(可以从 1 到 n 个单词)并比较这些单词是否在句子中

所以我尝试编写的代码类似于

Sentence = [
    "Hello, I am new here and I hope I will be able to help and get helped from Stackoverflow",
    "Is it beautiful weather", "I hope it wont be snowing here soon",
    "How is the weather"]

Words = ['I', 'it+weather']

for loop_word in Words:
    for loop_setence in Sentence:
        if loop_word in loop_setence:
            print(loop_setence)
            break

但是现在它只打印出第一句话,因为我现在将单词更改为 I。

我想要做的是,包含超过 1 个单词的单词应该在 etc I+be 之间添加一个特殊字符,所以每当句子中有 I 和 Be 时,它​​应该打印它找到了那个句子- 否则不打印任何内容。

所以我要问你的问题是,我希望如何继续我的观点:)?

【问题讨论】:

  • 'it+weather' in sentence 精确搜索此字符串:'it+weather' 不存在。
  • 第一句包含I和be,应该在输出中吗?
  • @MichaelButscher 哦,是的,这是正确的,我想我需要做一些事情,只要单词中有 +。它应该把它当作我相信的两个词。但不应将其视为 itweather 两个不同的分隔词,例如我会执行 `['it', 'weather']
  • @DanielMesejo 哦,我的错!是的,因为那句话中有一个I and Be
  • 如果这不是一个简单的练习,您应该使用适当的 NLP 工具而不是破解经典问题。

标签: python string loops if-statement


【解决方案1】:

你可以这样做:

words = ['I+be', 'it+weather']
sentences = ["Hello, I am new here and I hope I will be able to help and get helped from Stackoverflow",
             "Is it beautiful weather", "I hope it wont be snowing here soon", "How is the weather"]

def check_all(sentence, ws):
    return all(w in sentence for w in ws)

for sentence in sentences:
    if any(check_all(sentence, word.split('+')) for word in words):
        print(sentence)

输出

Hello, I am new here and I hope I will be able to help and get helped from Stackoverflow
Is it beautiful weather
I hope it wont be snowing here soon

函数check_all 检查一组单词中的所有单词(例如'I+be')是否都在句子中。然后,如果句子中有任何一组单词,您应该打印该句子。请注意,您必须先拆分 '+' 以查找组是否匹配。

更新

只匹配整个单词我建议你使用regex,例如:

import re

words = ['I+be', 'it+weather']
sentences = ["Hello, I am new here and I hope I will be able to help and get helped from Stackoverflow",
             "Is it beautiful weather", "I hope it wont be snowing here soon", "How is the weather", "With In be"]


def check_all(sentence, ws):
    """Returns True if all the words are present in the sentence"""
    return all(re.search(r'\b{}\b'.format(w), sentence) for w in ws)


for sentence in sentences:
    if any(check_all(sentence, word.split('+')) for word in words):
        print(sentence)

输出

Hello, I am new here and I hope I will be able to help and get helped from Stackoverflow
Is it beautiful weather
I hope it wont be snowing here soon

请注意,第二个示例的输出中不包含"With In be"

进一步

  1. 请参阅anyall 上的文档。
  2. Python regular expression match whole word

【讨论】:

  • 我确实发现了一个问题。如果你只留下一个字等words = ['I', 'it+weather'] - 所以只要一个字里面有一个大写字母I,它就会将它标记为它在里面。但不是作为一个词
  • @Hellosiroverthere 更新了答案!
  • 这很奇怪,我只是用这些句子运行,只打印“Stackoverflow hello testing red”
  • 哦,这很奇怪。我一定忘了做点什么。让我再看看我到底犯了什么错误!
  • 我刚查过!你是对的!是我做错了!我真的很抱歉,我很感激你的回答!我认为这是我至少使用过的正确答案:)
【解决方案2】:

使用filteranyallsplit

In [22]: Sentence1 = "Hello, I am new here and I hope I will be able to help and get helped from Stackoverflow"
    ...:
    ...: Sentence2 = "Is it beautiful weather"
    ...:
    ...: Sentence3 = "I hope it wont be snowing here soon"
    ...:
    ...: Sentence4 = "How is the weather"
    ...:
    ...: Words = ['I+be', 'it+weather']
    ...:

In [23]: sentences = [Sentence1, Sentence2, Sentence3, Sentence4]

In [27]: list(filter(lambda s: any(all(w in s.split() for w in word.split('+')) for word in Words), sentences))
    ...:
Out[27]:
['Hello, I am new here and I hope I will be able to help and get helped from Stackoverflow',
 'Is it beautiful weather',
 'I hope it wont be snowing here soon']

如果关键字之一在其中一个句子中,则推导式返回 FalseTrue 生成器。如果内部容器的所有元素都是Trueall 将返回True。相反,如果内部容器的任何元素是 Trueany 将返回 True

检查'be' 不会返回Sentence2

In [43]: Words = ['be']

In [44]: list(filter(lambda s: any(all(w in s.split() for w in word.split('+')) for word in Words), sentences))
Out[44]:
['Hello, I am new here and I hope I will be able to help and get helped from Stackoverflow',
 'I hope it wont be snowing here soon']

请注意,这不会考虑标点符号。 IE。 'Hello' != 'Hello,'

【讨论】:

  • 您好!我确实在这个程序中发现了一个问题,所以基本上每当你将你的单词更改为 etc i 中的一个单词时,它会占用所有 i 包含在任何单词中的意思,比如美丽,它会打印出来,因为它包含 i里面
  • 所以你要的是全词匹配?
  • 是的,完全正确:) + 附带的可能会造成混淆,但基本上它的作用是检查是否有等我并在我假设你编码的那个词中是相似的,但它只查找具有 I 和 Be 的任何内容,但我希望它应该适用于整个单词匹配,而不仅仅是每个字符 :)
  • 这似乎是解决方案。但是,每当我尝试通过执行print(list(filter(lambda s: any(all(w in s.split() for w in word.split('+')) for word in Words), sentences))) 来打印此功能时,它都会在列表中为我提供答案。有没有可能它只是打印出来而不在列表中[Is it beaitful weather]
  • 不行,没有分组机制就无法打印出一组东西
猜你喜欢
  • 2015-06-18
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 2016-11-27
  • 1970-01-01
  • 2013-11-23
  • 2015-06-15
  • 1970-01-01
相关资源
最近更新 更多