python将构造函数设置为生成器对象答案

【问题标题】：python set constructor to generator objectpython将构造函数设置为生成器对象
【发布时间】：2013-12-29 21:42:49
【问题描述】：

我正在尝试将集合构造函数应用于生成器对象，但它给出了错误消息：预期的字符串或缓冲区。但是，如果我将其转换为列表然后应用 set 构造函数，它不会给出任何错误。但我无法查看我的列表项，即使我使用多个句子，长度也显示为 1。我无法完全理解工作。任何解释将不胜感激！谢谢！代码如下：

train = [({'I love this sandwich.'}, 'pos'), ({'This is an amazing place!'}, 'pos'),
({'I feel very good about these beers.'}, 'pos'), ({'This is my best work.'}, 'pos'),
({"What an awesome view"}, 'pos'),({'I do not like this restaurant'}, 'neg'),
({'I am tired of this stuff.'}, 'neg'), ({"I can't deal with this"}, 'neg'),
({'He is my sworn enemy!'}, 'neg'), ({'My boss is horrible.'}, 'neg')]
all_words =(word.lower() for passage in train for word in word_tokenize(passage[0]))
print type(all_words)
all_words = set(all_words)
t= [({word: (word in word_tokenize(x[0])) for word in all_words}, x[1]) for x in train]

我得到的错误是 TypeError: Expected string or buffer。回溯如下：

Traceback (most recent call last):

File "C:/Users/5460/Desktop/train0501.py", line 18, in <module>
    all_words = set(all_words)
  File "C:/Users/5460/Desktop/train0501.py", line 15, in <genexpr>
    all_words = (word.lower() for passage in train for word in word_tokenize(passage[0]))
  File "C:\Python27\lib\site-packages\nltk\tokenize\__init__.py", line 87, in word_tokenize
    return _word_tokenize(text)
  File "C:\Python27\lib\site-packages\nltk\tokenize\treebank.py", line 67, in tokenize
    text = re.sub(r'^\"', r'``', text)
  File "C:\Python27\lib\re.py", line 151, in sub
    return _compile(pattern, flags).sub(repl, string, count)
  TypeError: expected string or buffer

【问题讨论】：

你能添加一些代码吗？
嗨。我刚刚用添加的代码编辑了这个问题。感谢您的宝贵时间！
尝试在两个 for 循环中使用 'yield' 关键字将单行生成器重写为一个函数，看看它是否能做到。
请显示回溯。我敢打赌set 指的是非内置类型，或者TypeError 来自其他地方（可能是word_tokenize）。
我们需要看看 word_tokenize 是什么

标签： python list generator

【解决方案1】：

在您的代码中，passage[0] 类似于 {'I love this sandwich.'}，它是一个 set（这就是 { ... } 所做的）。您的 word_tokenize 函数不适用于集合，因此会引发错误。

你应该简单地保持你的句子完整：

train = [('I love this sandwich.', 'pos'), ...]

【讨论】：