【发布时间】:2020-09-10 22:51:14
【问题描述】:
我正在研究 text 的多类分类问题,其中我有很多不同的类(15 岁以上)。 我已经训练了一个 Linearsvc svm 方法(方法只是示例)。 但是它只输出概率最高的单个类,有没有一种算法可以同时输出两个类
我正在使用的示例代码:
from sklearn.svm import LinearSVC
import matplotlib.pyplot as plt
from sklearn.feature_extraction.text import TfidfVectorizer,CountVectorizer
count_vect = CountVectorizer(max_df=.9,min_df=.002,
encoding='latin-1',
ngram_range=(1, 3))
X_train_counts = count_vect.fit_transform(df_upsampled['text'])
tfidf_transformer = TfidfTransformer(sublinear_tf=True,norm='l2')
X_train_tfidf = tfidf_transformer.fit_transform(X_train_counts)
clf = LinearSVC().fit(X_train_tfidf, df_upsampled['reason'])
y_pred = model.predict(X_test)
当前输出:
source user time text reason
0 hi neha 0 0:neha:hi 1
1 there ram 1 1:ram:there 1
2 ball neha 2 2:neha:ball 3
3 item neha 3 3:neha:item 6
4 go there ram 4 4:ram:go there 7
5 kk ram 5 5:ram:kk 1
6 hshs neha 6 6:neha:hshs 2
7 ggsgs neha 7 7:neha:ggsgs 15
想要的输出:
source user time text reason reason2
0 hi neha 0 0:neha:hi 1 2
1 there ram 1 1:ram:there 1 6
2 ball neha 2 2:neha:ball 3 7
3 item neha 3 3:neha:item 6 4
4 go there ram 4 4:ram:go there 7 9
5 kk ram 5 5:ram:kk 1 2
6 hshs neha 6 6:neha:hshs 2 3
7 ggsgs neha 7 7:neha:ggsgs 15 1
如果我只在一列中获得输出也没关系,因为我可以从中拆分并制作两列。
【问题讨论】:
标签: python-3.x scikit-learn text-classification multiclass-classification