【发布时间】:2021-07-24 18:48:15
【问题描述】:
这是我遇到的错误:
ConvergenceWarning: Liblinear 收敛失败,增加数量 的迭代。 warnings.warn("Liblinear 收敛失败,增加"
我一直在处理来自 nltk.corpus 的棕色数据集中的浪漫和新闻类别,到目前为止还没有任何问题。这是我要输入的代码:
import nltk
from nltk.corpus import brown
from nltk import pos_tag_sents
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
import sklearn
for cat in brown.categories():
print(cat)
news_sent = brown.sents(categories=["news"])
romance_sent = brown.sents(categories=["romance"])
ndf = pd.DataFrame({'label':'news', 'sentence':news_sent})
rdf = pd.DataFrame({'label':'romance', 'sentence':romance_sent})
df = pd.concat([ndf, rdf])
df.head()
df['label'].value_counts()
fig, ax = plt.subplots()
_ = df['label'].value_counts().plot.bar(ax=ax, rot=0)
fig.savefig("categories_counts.png", bbox_inches = 'tight', pad_inches = 0)
pos_all = pos_tag_sents(df['sentence'])
def countPOS(pos_tag_sent, POS):
pos_count = 0
all_pos_counts = []
for sentence in pos_tag_sent:
for word in sentence:
tag = word[1]
if tag [:2] == POS:
pos_count = pos_count+1
all_pos_counts.append(pos_count)
pos_count = 0
return all_pos_counts
df['NN'] = countPOS(pos_all, 'NN')
df['JJ'] = countPOS(pos_all, 'JJ')
df.groupby('label').sum()
df.to_csv("df_news_romance.csv", index=False)
df = pd.read_csv("df_news_romance.csv")
fv = df[["NN", "JJ"]]
df['label'].value_counts()
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(fv, df['label'],
stratify=df['label'],
test_size=0.25,
random_state = 42)
print(X_train.shape)
print(X_test.shape)
from sklearn.svm import LinearSVC
classifier = LinearSVC()
classifier.fit(X_train, y_train)
此时,我收到上述错误。为了从原始帖子中添加更多信息,我尝试了增加 max_iter 和添加 LinearSVC(dual=False) 等方法,但没有任何改进。任何帮助将不胜感激!
【问题讨论】:
标签: python scikit-learn