Convergence Warning Linear SVC — 增加迭代次数？答案

【问题标题】：Convergence Warning Linear SVC — increase the number of iterations?Convergence Warning Linear SVC — 增加迭代次数？
【发布时间】：2021-07-24 18:48:15
【问题描述】：

这是我遇到的错误：

ConvergenceWarning: Liblinear 收敛失败，增加数量的迭代。 warnings.warn("Liblinear 收敛失败，增加"

我一直在处理来自 nltk.corpus 的棕色数据集中的浪漫和新闻类别，到目前为止还没有任何问题。这是我要输入的代码：

import nltk
from nltk.corpus import brown
from nltk import pos_tag_sents
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
import sklearn

for cat in brown.categories():
    print(cat)

news_sent = brown.sents(categories=["news"])
romance_sent = brown.sents(categories=["romance"])

ndf = pd.DataFrame({'label':'news', 'sentence':news_sent})
rdf = pd.DataFrame({'label':'romance', 'sentence':romance_sent})

df = pd.concat([ndf, rdf])

df.head()

df['label'].value_counts()

fig, ax = plt.subplots()
_ = df['label'].value_counts().plot.bar(ax=ax, rot=0)
fig.savefig("categories_counts.png", bbox_inches = 'tight', pad_inches = 0)

pos_all = pos_tag_sents(df['sentence'])

def countPOS(pos_tag_sent, POS):
    pos_count = 0
    all_pos_counts = []
    for sentence in pos_tag_sent:
        for word in sentence:
            tag = word[1]
            if tag [:2] == POS:
                pos_count = pos_count+1
        all_pos_counts.append(pos_count)
        pos_count = 0
    return all_pos_counts

df['NN'] = countPOS(pos_all, 'NN')
df['JJ'] = countPOS(pos_all, 'JJ')

df.groupby('label').sum()

df.to_csv("df_news_romance.csv", index=False)

df = pd.read_csv("df_news_romance.csv")

fv = df[["NN", "JJ"]]

df['label'].value_counts()

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(fv, df['label'],
                                                stratify=df['label'],
                                                test_size=0.25,
                                                   random_state = 42)

print(X_train.shape)
print(X_test.shape)

from sklearn.svm import LinearSVC
classifier = LinearSVC()

classifier.fit(X_train, y_train)

此时，我收到上述错误。为了从原始帖子中添加更多信息，我尝试了增加 max_iter 和添加 LinearSVC(dual=False) 等方法，但没有任何改进。任何帮助将不胜感激！

【问题讨论】：

标签： python scikit-learn

【解决方案1】：

您可能需要设置LinearSVC(dual=False)以防数据中的样本数量超过特征数量。 LinearSVC 的原始配置将 dual 设置为 True，因为它用于解决对偶问题。您也可以尝试增加最大迭代次数（例如max_iter=10000）。

【讨论】：

感谢您的建议。我之前尝试过这两种解决方案，但都没有结果。我再次尝试了它们，但仍然没有改善。不知道还能尝试什么。
缩放（标准化）也可以帮助scikit-learn.org/stable/modules/generated/…