【发布时间】:2013-12-29 21:42:49
【问题描述】:
我正在尝试将集合构造函数应用于生成器对象,但它给出了错误消息:预期的字符串或缓冲区。但是,如果我将其转换为列表然后应用 set 构造函数,它不会给出任何错误。但我无法查看我的列表项,即使我使用多个句子,长度也显示为 1。我无法完全理解工作。任何解释将不胜感激!谢谢!代码如下:
train = [({'I love this sandwich.'}, 'pos'), ({'This is an amazing place!'}, 'pos'),
({'I feel very good about these beers.'}, 'pos'), ({'This is my best work.'}, 'pos'),
({"What an awesome view"}, 'pos'),({'I do not like this restaurant'}, 'neg'),
({'I am tired of this stuff.'}, 'neg'), ({"I can't deal with this"}, 'neg'),
({'He is my sworn enemy!'}, 'neg'), ({'My boss is horrible.'}, 'neg')]
all_words =(word.lower() for passage in train for word in word_tokenize(passage[0]))
print type(all_words)
all_words = set(all_words)
t= [({word: (word in word_tokenize(x[0])) for word in all_words}, x[1]) for x in train]
我得到的错误是 TypeError: Expected string or buffer。回溯如下:
Traceback (most recent call last):
File "C:/Users/5460/Desktop/train0501.py", line 18, in <module>
all_words = set(all_words)
File "C:/Users/5460/Desktop/train0501.py", line 15, in <genexpr>
all_words = (word.lower() for passage in train for word in word_tokenize(passage[0]))
File "C:\Python27\lib\site-packages\nltk\tokenize\__init__.py", line 87, in word_tokenize
return _word_tokenize(text)
File "C:\Python27\lib\site-packages\nltk\tokenize\treebank.py", line 67, in tokenize
text = re.sub(r'^\"', r'``', text)
File "C:\Python27\lib\re.py", line 151, in sub
return _compile(pattern, flags).sub(repl, string, count)
TypeError: expected string or buffer
【问题讨论】:
-
你能添加一些代码吗?
-
嗨。我刚刚用添加的代码编辑了这个问题。感谢您的宝贵时间!
-
尝试在两个 for 循环中使用 'yield' 关键字将单行生成器重写为一个函数,看看它是否能做到。
-
请显示回溯。我敢打赌
set指的是非内置类型,或者TypeError来自其他地方(可能是word_tokenize)。 -
我们需要看看 word_tokenize 是什么