【发布时间】:2021-11-16 11:46:26
【问题描述】:
数据框:
id review name label
1 it is a great product for turning lights on. Ashley
2 plays music and have a good sound. Alex
3 I love it, lots of fun. Peter
我想使用概率分类器 (linear_svc) 根据评论预测标签(概率为 1)。我的代码:
from sklearn.svm import LinearSVC
from sklearn.calibration import CalibratedClassifierCV
from sklearn import datasets
#Load dataset
X = training['review']
y = training['label']
linear_svc = LinearSVC() #The base estimator
# This is the calibrated classifier which can give probabilistic classifier
calibrated_svc = CalibratedClassifierCV(linear_svc,
method='sigmoid', #sigmoid will use Platt's scaling. Refer to documentation for other methods.
cv=3)
calibrated_svc.fit(X, y)
# predict
prediction_data = predict_data['review']
predicted_probs = calibrated_svc.predict_proba(prediction_data)
calibrated_svc.fit(X, y) 出现以下错误:
ValueError: could not convert string to float: 'it is a great product 转身……'
感谢您的帮助。
【问题讨论】:
-
文本数据需要以某种方式编码,例如one-hot 编码、词嵌入等
-
谢谢,@tdy。只需运行 one-hot 编码,仍然无法正常工作。
-
为什么你的标签栏是空的?
标签: scikit-learn text-classification valueerror