【问题标题】:Convergence Warning Linear SVC — increase the number of iterations?Convergence Warning Linear SVC — 增加迭代次数?
【发布时间】:2021-07-24 18:48:15
【问题描述】:

这是我遇到的错误:

ConvergenceWarning: Liblinear 收敛失败,增加数量 的迭代。 warnings.warn("Liblinear 收敛失败,增加"

我一直在处理来自 nltk.corpus 的棕色数据集中的浪漫和新闻类别,到目前为止还没有任何问题。这是我要输入的代码:

import nltk
from nltk.corpus import brown
from nltk import pos_tag_sents
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
import sklearn

for cat in brown.categories():
    print(cat)

news_sent = brown.sents(categories=["news"])
romance_sent = brown.sents(categories=["romance"])

ndf = pd.DataFrame({'label':'news', 'sentence':news_sent})
rdf = pd.DataFrame({'label':'romance', 'sentence':romance_sent})

df = pd.concat([ndf, rdf])

df.head()

df['label'].value_counts()

fig, ax = plt.subplots()
_ = df['label'].value_counts().plot.bar(ax=ax, rot=0)
fig.savefig("categories_counts.png", bbox_inches = 'tight', pad_inches = 0)

pos_all = pos_tag_sents(df['sentence'])

def countPOS(pos_tag_sent, POS):
    pos_count = 0
    all_pos_counts = []
    for sentence in pos_tag_sent:
        for word in sentence:
            tag = word[1]
            if tag [:2] == POS:
                pos_count = pos_count+1
        all_pos_counts.append(pos_count)
        pos_count = 0
    return all_pos_counts

df['NN'] = countPOS(pos_all, 'NN')
df['JJ'] = countPOS(pos_all, 'JJ')

df.groupby('label').sum()

df.to_csv("df_news_romance.csv", index=False)

df = pd.read_csv("df_news_romance.csv")

fv = df[["NN", "JJ"]]

df['label'].value_counts()

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(fv, df['label'],
                                                stratify=df['label'],
                                                test_size=0.25,
                                                   random_state = 42)

print(X_train.shape)
print(X_test.shape)

from sklearn.svm import LinearSVC
classifier = LinearSVC()

classifier.fit(X_train, y_train)

此时,我收到上述错误。为了从原始帖子中添加更多信息,我尝试了增加 max_iter 和添加 LinearSVC(dual=False) 等方法,但没有任何改进。任何帮助将不胜感激!

【问题讨论】:

    标签: python scikit-learn


    【解决方案1】:

    您可能需要设置LinearSVC(dual=False)以防数据中的样本数量超过特征数量。 LinearSVC 的原始配置将 dual 设置为 True,因为它用于解决对偶问题。您也可以尝试增加最大迭代次数(例如max_iter=10000)。

    【讨论】:

    • 感谢您的建议。我之前尝试过这两种解决方案,但都没有结果。我再次尝试了它们,但仍然没有改善。不知道还能尝试什么。
    • 缩放(标准化)也可以帮助scikit-learn.org/stable/modules/generated/…
    猜你喜欢
    • 2013-05-17
    • 1970-01-01
    • 2016-04-21
    • 2013-10-29
    • 1970-01-01
    • 2019-06-19
    • 2015-08-01
    • 2019-03-11
    • 1970-01-01
    相关资源
    最近更新 更多