【问题标题】:Error in NLP "expected string or bytes-like object"NLP 中的错误“预期的字符串或类似字节的对象”
【发布时间】:2020-08-24 07:00:10
【问题描述】:

我想将文本转换为合适的“自然语言处理”

“TEXT”一栏大约有 3000+ 本书 每一行都有大文本或每一行有一本书,所以当我应用此代码时,我收到如下所示的错误。

当我应用以下代码时

 import re
 import nltk
 nltk.download('stopwords')
 from nltk.corpus import stopwords
from nltk.stem.porter import PorterStemmer
corpus = []
for i in range(len(dt)):
     review = re.sub('[^a-zA-Z0-9]', ' ', dt['TEXT'][i])
     review = review.lower()
     review = review.split()
     ps = PorterStemmer()
     review = [ps.stem(word) for word in review if not word in set(stopwords.words('english'))]
     review = ' '.join(review)
     corpus.append(review)

我收到以下错误

TypeError                                 Traceback (most recent call last)
<ipython-input-16-47569f8727fa> in <module>
       6 corpus = []
       7 for i in range(1000,2000):
  ----> 8     review = re.sub('[^a-zA-Z0-9]', ' ', dt['TEXT'][i])
       9     review = review.lower()
      10     review = review.split()

      ~\anaconda3\lib\re.py in sub(pattern, repl, string, count, flags)
      190     a callable, it's passed the Match object and must return
      191     a replacement string to be used."""
     --> 192     return _compile(pattern, flags).sub(repl, string, count)
     193 
     194 def subn(pattern, repl, string, count=0, flags=0):

     TypeError: expected string or bytes-like object

【问题讨论】:

    标签: python-3.x nlp


    【解决方案1】:

    这意味着在您的 DataFrame 列“TEXT”中有不是字符串的值。

    你可以这样做:

    for i in range(len(df)): 
        try: 
            re.sub('[^a-zA-Z0-9]', ' ', df['TEXT'][i])
            # the rest of your code ...  
        except TypeError: 
            pass 
    

    【讨论】:

    • TypeError: 预期的字符串或类似字节的对象
    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 2020-11-19
    • 1970-01-01
    • 2017-01-19
    • 2019-12-13
    • 2017-09-29
    • 1970-01-01
    相关资源
    最近更新 更多