Python：如何使用关键字列表在文本中搜索字符串答案

【问题标题】：Python: How to use list of keywords to search for a string in a textPython：如何使用关键字列表在文本中搜索字符串
【发布时间】：2021-03-08 02:13:46
【问题描述】：

所以我正在编写一个循环多个 .txt 文件并搜索任意数量的预先指定关键字的程序。我很难找到一种方法来通过要搜索的关键字列表。

下面的代码目前返回如下错误：

TypeError: 'in <string>' requires string as left operand, not list

我知道错误是由关键字列表引起的，但我不知道如何输入大量关键字而不出现此错误。

当前代码：

from os import listdir

keywords=['Example', 'Use', 'Of', 'Keywords']
 
with open("/home/user/folder/project/result.txt", "w") as f:
    for filename in listdir("/home/user/folder/project/data"):
        with open('/home/user/folder/project/data/' + filename) as currentFile:
            text = currentFile.read()
            #Error Below
            if (keywords in text):
                f.write('Keyword found in ' + filename[:-4] + '\n')
            else:
                f.write('No keyword in ' + filename[:-4] + '\n')

上述代码的第 10 行在注释部分下指出了错误。我不确定为什么我不能调用列表来搜索关键字。任何帮助表示赞赏，谢谢！

【问题讨论】：

错误信息的哪一部分你不明白？您不能使用in 运算符来查看list 是否在str 中。这不是一个定义的操作。比如，您想检查列表中的任何字符串是否在字符串中。
这能回答你的问题吗？ Check if multiple strings exist in another string

标签： python if-statement directory python-3.8 listdir

【解决方案1】：

你可以替换

if (keywords in text):
   ...

与

if any(keyword in text for keyword in keywords):
   ...

【讨论】：

谢谢，一直在寻找，这是最好的解决方法！

【解决方案2】：

尝试循环遍历列表以查看每个元素是否在文本中

for i in range(0, len(keywords)):
    if keywords[i] in text:
        f.write('Keyword found in ' + filename[:-4] + '\n')
        break
    else:
        f.write('No keyword in ' + filename[:-4] + '\n')
        break

您也不能使用in 来查看列表是否在字符串中

【讨论】：

我只想在白天有更多的休息时间，但也许这段代码中只有一个 break 就足够了？

【解决方案3】：

我会使用regular expressions，因为它们是专门为在文本中搜索子字符串而构建的。

您只需要 re.search 块。我添加了findall 和finditer 的示例来揭开它们的神秘面纱。

# lets pretend these 4 sentences in `text` are 4 different files
text = '''Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum'''.split(sep='. ')

# add more keywords
keywords=[r'publishing', r'industry']
regex = '|'.join(keywords)
import re
for t in text:
    lst = re.findall(regex, t, re.I) # re.I make case-insensitive
    for el in lst:
        print(el)

    iterator = re.finditer(regex, t, re.I)
    for el in iterator:
        print(el.span())

    if re.search(regex, t, re.I):
        print('Keyword found in `' + t + '`\n')
    else:
        print('No keyword in `' + t + '`\n')

输出：

industry
(65, 73)
Keyword found in `Lorem Ipsum is simply dummy text of the printing and typesetting industry`

industry
(25, 33)
Keyword found in `Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book`

No keyword in `It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged`

publishing
(132, 142)
Keyword found in `It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum`

【讨论】：

我发现正则表达式在过去使用它的代码中产生了一些冲突，但会再试一次。感谢您的解决方案！
@StephenFlynn 希望您现在可以再次使用正则表达式？给re.search一个公牛，让我知道结果。