【发布时间】:2020-09-23 08:38:06
【问题描述】:
我正在尝试在 sklearn 中为线性 svm 编写最重要的功能,并且我已经在网上找到了一些行,但是将其应用于我的代码时会返回错误。我该怎么办?
我的代码:
class SVMSentiment(Base):
"""Predict sentiment scores using a linear Support Vector Machine (SVM).
Uses a sklearn pipeline.
"""
def __init__(self, model_file: str=None) -> None:
super().__init__()
# pip install sklearn
from sklearn.feature_extraction.text import CountVectorizer, TfidfTransformer, TfidfVectorizer
from sklearn.linear_model import SGDClassifier
from sklearn.svm import SVC, LinearSVC
from sklearn.pipeline import Pipeline
self.pipeline = Pipeline(
[
#('vect', CountVectorizer()),
# ('tfidf', TfidfTransformer()),
('tfidf', TfidfVectorizer()),
('clf', LinearSVC( loss='hinge'
)),
]
)
def predict(self, train_file: str, test_file: str, lower_case: bool) -> pd.DataFrame:
"Train model using sklearn pipeline"
from sklearn.model_selection import GridSearchCV
from sklearn.svm import SVC
from sklearn.linear_model import SGDClassifier
from sklearn import svm
from sklearn import preprocessing
from sklearn.preprocessing import LabelEncoder, OneHotEncoder
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.svm import LinearSVC
train_df = self.read_data(train_file, lower_case)
learner = self.pipeline.fit(train_df['text'], train_df['truth'])
# Fit the learner to the test data
test_df = self.read_data(test_file, lower_case)
test_df['pred'] = learner.predict(test_df['text'])
return test_df
def f_importances(coef, names):
imp = coef
imp, names = zip(*sorted(zip(imp, names)))
plt.barh(range(len(names)), imp, align='center')
plt.yticks(range(len(names)), names)
plt.show()
features_names = [train_df['text'], train_df['truth']]
learner = self.pipeline.fit(train_df['text'], train_df['truth'])
print(f_importances(self.pipeline.coef_, features_names))
错误提示:
NameError: name 'train_df' is not defined
但是我已经定义了train_df,所以我不明白我应该如何修复它。
【问题讨论】:
标签: python python-3.x machine-learning scikit-learn