【发布时间】:2020-08-24 07:00:10
【问题描述】:
我想将文本转换为合适的“自然语言处理”
“TEXT”一栏大约有 3000+ 本书 每一行都有大文本或每一行有一本书,所以当我应用此代码时,我收到如下所示的错误。
当我应用以下代码时
import re
import nltk
nltk.download('stopwords')
from nltk.corpus import stopwords
from nltk.stem.porter import PorterStemmer
corpus = []
for i in range(len(dt)):
review = re.sub('[^a-zA-Z0-9]', ' ', dt['TEXT'][i])
review = review.lower()
review = review.split()
ps = PorterStemmer()
review = [ps.stem(word) for word in review if not word in set(stopwords.words('english'))]
review = ' '.join(review)
corpus.append(review)
我收到以下错误
TypeError Traceback (most recent call last)
<ipython-input-16-47569f8727fa> in <module>
6 corpus = []
7 for i in range(1000,2000):
----> 8 review = re.sub('[^a-zA-Z0-9]', ' ', dt['TEXT'][i])
9 review = review.lower()
10 review = review.split()
~\anaconda3\lib\re.py in sub(pattern, repl, string, count, flags)
190 a callable, it's passed the Match object and must return
191 a replacement string to be used."""
--> 192 return _compile(pattern, flags).sub(repl, string, count)
193
194 def subn(pattern, repl, string, count=0, flags=0):
TypeError: expected string or bytes-like object
【问题讨论】:
标签: python-3.x nlp