【问题标题】:AttributeError: 'str' object has no attribute 'read'AttributeError:“str”对象没有属性“read”
【发布时间】:2015-03-02 02:25:40
【问题描述】:
list=[]
ct = 1      
import numpy as np

import os, os.path
isfile = os.path.isfile
join = os.path.join
fn = 'C:\\Users\\Keshav\\Desktop\\xyz\\data1\\black_and_white\\'
target = np.array([1, 2, 3, 4, 5])
num = sum(1 for item in os.listdir(fn) if isfile(join(fn, item)))



for ct in range(1,num+1):
    f = open(fn+"1_"+str(ct)+".dat","r") 
    list.append(f)
    ct = ct + 1

from sklearn.feature_extraction.text import CountVectorizer
count_vect = CountVectorizer(input="file")
X_train_counts = count_vect.fit_transform(list)

from sklearn.feature_extraction.text import TfidfTransformer
tfidf_transformer = TfidfTransformer()
X_train_tfidf = tfidf_transformer.fit_transform(X_train_counts)
#print X_train_tfidf.shape

from sklearn.naive_bayes import MultinomialNB
clf = MultinomialNB().fit(X_train_tfidf, target)
#clf.fit(X, y)
docs_new = ['10 years of marriage and now divorce. I just wasted my entire life too with her.']
X_new_counts = count_vect.transform(docs_new)
print X_new_counts

X_new_tfidf = tfidf_transformer.transform(X_new_counts)
predicted = clf.predict(X_new_tfidf)
print predicted

我正在尝试使用以下link 使用 sklearn 构建多类分类器。

这里使用的分类器是多项朴素贝叶斯分类器。

我收到以下错误:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Users\Keshav\Anaconda\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 580, in runfile
    execfile(filename, namespace)
  File "C:/Users/Keshav/Desktop/iHeal/mturk-distortions/main1.py", line 40, in <module>
    X_new_counts = count_vect.transform(docs_new)
  File "C:\Users\Keshav\Anaconda\lib\site-packages\sklearn\feature_extraction\text.py", line 867, in transform
    _, X = self._count_vocab(raw_documents, fixed_vocab=True)
  File "C:\Users\Keshav\Anaconda\lib\site-packages\sklearn\feature_extraction\text.py", line 748, in _count_vocab
    for feature in analyze(doc):
  File "C:\Users\Keshav\Anaconda\lib\site-packages\sklearn\feature_extraction\text.py", line 234, in <lambda>
    tokenize(preprocess(self.decode(doc))), stop_words)
  File "C:\Users\Keshav\Anaconda\lib\site-packages\sklearn\feature_extraction\text.py", line 109, in decode
    doc = doc.read()
AttributeError: 'str' object has no attribute 'read'

知道如何解决吗?

【问题讨论】:

  • 你能发布完整的回溯吗?一开始,这将显示导致错误的代码行。
  • 是的,我已经发布了完整的回溯!

标签: python scikit-learn


【解决方案1】:
docs_new = ['10 years of marriage and now divorce. I just wasted my entire life too with her.']

字符串的列表——这不是count_vect.transform想要的;它想要一个带有read 方法的类文件对象列表。

所以 import StringIO 在你的模块顶部并添加

docs_new = [ StringIO.StringIO(x) for x in docs_new ]

在你第一次分配给docs_new之后,你会没事的......

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2020-09-28
    • 1970-01-01
    • 1970-01-01
    • 2013-10-28
    • 2017-03-02
    • 1970-01-01
    相关资源
    最近更新 更多