TypeError: 'in' 需要字符串作为左操作数，而不是 Python 中的生成器答案

【问题标题】：TypeError: 'in ' requires string as left operand, not generator in PythonTypeError: 'in' 需要字符串作为左操作数，而不是 Python 中的生成器
【发布时间】：2011-10-02 16:58:29
【问题描述】：

我正在尝试解析推文数据。

我的数据形状如下：

59593936 3061025991 null null <d>2009-08-01 00:00:37</d> <s>&lt;a href="http://help.twitter.com/index.php?pg=kb.page&amp;id=75" rel="nofollow"&gt;txt&lt;/a&gt;</s> <t>honda just recalled 440k accords...traffic around here is gonna be light...win!!</t> ajc8587 15 24 158 -18000 0 0 <n>adrienne conner</n> <ud>2009-07-23 21:27:10</ud> <t>eastern time (us &amp; canada)</t> <l>ga</l>
22020233 3061032620 null null <d>2009-08-01 00:01:03</d> <s>&lt;a href="http://alexking.org/projects/wordpress" rel="nofollow"&gt;twitter tools&lt;/a&gt;</s> <t>new blog post: honda recalls 440k cars over airbag risk http://bit.ly/2wsma</t> madcitywi 294 290 9098 -21600 0 0 <n>madcity</n> <ud>2009-02-26 15:25:04</ud> <t>central time (us &amp; canada)</t> <l>madison, wi</l>

我想获取推文总数和关键字相关推文的数量。我在文本文件中准备了关键字。另外，我想获取推文文本内容，包含提及（@）、转发（RT）和 URL 的推文总数（我想将每个 URL 保存在其他文件中）。

所以，我是这样编码的。

import time
import os

total_tweet_count = 0
related_tweet_count = 0
rt_count = 0
mention_count = 0
URLs = {}

def get_keywords(filepath, mode):
    with open(filepath, mode) as f:
        for line in f:
            yield line.split().lower()

for line in open('/nas/minsu/2009_06.txt'):
    tweet = line.strip().lower()

    total_tweet_count += 1

    with open('./related_tweets.txt', 'a') as save_file_1:
        keywords = get_keywords('./related_keywords.txt', 'r')

        if keywords in line:
            text =  line.split('<t>')[1].split('</t>')[0]

            if 'http://' in text:
                try:
                    url = text.split('http://')[1].split()[0]
                    url = 'http://' + url

                    if url not in URLs:
                        URLs[url] = []
                    URLs[url].append('\t' + text)

                    save_file_3 = open('./URLs_in_related_tweets.txt', 'a')
                    print >> save_file_3, URLs

                except:
                    pass

            if '@' in text:
                mention_count +=1

            if 'RT' in text:
                rt_count += 1

            related_tweet_count += 1

            print >> save_file_1, text

    save_file_2 = open('./info_related_tweets.txt', 'w')

print >> save_file_2, str(total_tweet_count) + '\t' + srt(related_tweet_count) + '\t' + str(mention_count) + '\t' + str(rt_count)

save_file_1.close()
save_file_2.close()
save_file_3.close()

以下是示例关键字

Depression
Placebo
X-rays
X-ray
HIV
Blood preasure
Flu
Fever
Oral Health
Antibiotics
Diabetes
Mellitus
Genetic disorders

我认为我的代码有很多问题，但第一个错误如下：

回溯（最近一次调用最后一次）：文件“health_related_tweets.py”，第 23 行，在 if 关键字行中：TypeError: 'in ' 需要字符串作为左操作数，而不是生成器

请帮帮我！

【问题讨论】：

我认为您需要使用正则表达式。当人们想从文本中提取数据时，它是使用的工具。见模块 re

标签： python

【解决方案1】：

原因是keywords = get_keywords(...) 返回了一个生成器。从逻辑上考虑，关键字应该是所有关键字的列表。对于此列表中的每个关键字，您要检查它是否在推文/行中。

示例代码：

keywords = get_keywords('./related_keywords.txt', 'r')
has_keyword = False
for keyword in keywords:
  if keyword in line:
    has_keyword = True
    break
if has_keyword:
  # Your code here (for the case when the line has at least one keyword)

（上面的代码将替换if keywords in line:）

【讨论】：

我遇到了另一个错误。（回溯（最近一次调用最后一次）：文件“health_related_tweets.py”，第 25 行，在中，用于关键字中的关键字：文件“health_related_tweets.py”，第 13 行，在 get_keywords 中产生 line.split().lower() AttributeError: 'list' object has no attribute 'lower') 我认为我需要转换关键字和推文，这些关键字和推文将以小写形式进行解析。所以我把“.lower”放在我的代码中。但它会出错......我该如何解决？
这又是有道理的。 line.split() 将为您提供一个（字符串）列表，而 lower() 则适用于字符串。你能给我一个示例related_keywords.txt。
related_keywords.txt 包含这样的词： Dentist Depression Placebo X-rays X-ray HIV Blood preasure Flu或诸如Boold preasure之类的短语写在一行中。所以我将其拆分为“.split（）”）
我将示例关键字放在正文中！感谢您的帮助！
太棒了。理想情况下，您不需要拆分功能，因为您不想将“血压”之类的词拆分为 [“血液”，“压力”]。您正在寻找文本中的整个单词。