哪个分类器会提高文本分类的准确性答案

【问题标题】：Which Classifier would increase accuracy for text classification哪个分类器会提高文本分类的准确性
【发布时间】：2018-04-09 23:06:33
【问题描述】：

我已使用以下代码进行分类。我得到了 55% 到 60% 的可变精度。我想将我的准确率提高到 85%-90%。我分为 8 个不同的类别。我应该采取哪些步骤来提高准确性。

 import pandas as pd
 import numpy as np
 from sklearn.feature_extraction.text import CountVectorizer
 from sklearn.feature_extraction.text import TfidfTransformer
 from sklearn.linear_model import SGDClassifier
 from sklearn.pipeline import Pipeline
 from nltk import word_tokenize         
 from textblob import TextBlob


cov = pd.read_csv("F:/kipro/ml/dataset.csv", 
              names = ["Complaint", "target"])
cov.dropna()
s=pd.factorize(cov['target'])
cov['tarname']=s[0]
msk = np.random.rand(len(cov)) < 0.8
train = cov[msk]
test = cov[~msk]
train.dropna()
test.dropna()
y_train, y_test = train.tarname, test.tarname

def tokens(message):
   return TextBlob(message).words

def lemmas(message):
   message=message.lower()
   words = TextBlob(message).words
   return [word.lemma for word in words]

text_clf = Pipeline([('vect', CountVectorizer(analyzer=lemmas)),
                  ('tfidf', TfidfTransformer()),
                  ('clf-svm', SGDClassifier())
                ,])
text_clf = text_clf.fit(train['Complaint'].values.astype('U'),train['tarname'])
predicted = text_clf.predict(test['Complaint'].values.astype('U'))
x=np.mean(( y_test==predicted))*100
print(x)

这是我的数据集截图 enter image description here

【问题讨论】：

如果您想要一个相当好的答案，请不要发布数据集的快照。您希望我们也为您复制文本吗？看看这里的一些方向：Good Question
首先，除了 tf-idf 之外，您还可以通过其他方式从文本中提取特征。
我正在尝试使用 gridsearchcv，但它需要很多时间来适应。你能建议我应该使用哪种特征提取方法@VivekKumar

标签： python machine-learning classification

【解决方案1】：

关于分类器：- 尝试使用朴素贝叶斯多项式分类器。 Svm 有点慢，但它可以将准确度提高到几个百分比。还在单词级别使用 n gram(2,3) 方法。除了这些。 XG boost（Extreme boosting）也是一个非常不错的选择。只需将学习率保持在最低限度，训练需要时间，但您的模型不会过度拟合，并且会提供良好的准确性。除了这些使用来自 nltk 库的停用词。我认为，您的准确性将提高到一定百分比。我不太确定 85%+ 的准确度，但这些实现肯定会在很大程度上提高准确度。

【讨论】：

【解决方案2】：

尝试使用朴素贝叶斯分类器。它通常适用于大多数文本分类问题。

【讨论】：