逻辑回归中的错误图答案

【问题标题】：wrong plot in logistic regression逻辑回归中的错误图
【发布时间】：2016-12-13 03:35:49
【问题描述】：

我正在尝试实现逻辑回归，但我收到了错误的绘图。

import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn import cross_validation
from sklearn.linear_model import LogisticRegression
sns.set()

x = (np.random.randint(2000, size=400)).reshape((400,1))
y = (np.random.randint(2, size=400)).reshape((400,1)).ravel()

x_train, x_test, y_train, y_test = cross_validation.train_test_split(x, y, test_size=0.4, random_state=0)

logistic_regr = LogisticRegression()
logistic_regr.fit(x_train, y_train)

fig, ax = plt.subplots()

ax.set(xlabel='x', ylabel='y')
ax.plot(x_test, logistic_regr.predict_proba(x_test), label='Logistic regr')
#ax.plot(x_test,logistic_regr.predict(x_test), label='Logistic regr')
ax.legend()

我收到以下情节：

如果我使用：

ax.plot(x_test,logistic_regr.predict(x_test), label='Logistic regr')

我收到了：

【问题讨论】：

你的回归预测总是0，这就是你有这个情节的原因。您的训练数据是完全随机的，您的目标仅由 0 和 1 组成，并且您希望它是线性回归。所以回归是一条线，它预测要么总是 0，要么总是 1。
@MMF:Hmm.Right！我的目标必须位于 [0,1] 之间，因为它是概率。如果我尝试作为目标 np.linspace(0,1,400).ravel() 它会抛出 Unknown label type
但问题是您只有0 或1。不是介于两者之间的值。 np.random.randint( ) 只返回整数
@MMF: 我更新了我的评论
@MMF: 使用logistic_regr.predict_proba 不应该找到 [0,1] 之间的概率吗？不管我的目标是什么？

标签： python scikit-learn

【解决方案1】：

好吧，您不会根据您的特定数据选择获得 sigmoid 函数图。您的随机输入使算法能够在类之间找到一些分离，这些分离将预测接近 0.5 的概率，其变化取决于您输入的随机性。您可以通过使用均匀分割的值范围来获得 sigmoid，其中一半属于第一类，另一半属于第二类。这样，您的 predict_proba() 函数将输出从 0 到 1 的特定类的概率范围（我假设您的其余代码将保持不变）：

x = np.linspace(-2, 2, 400).reshape((400,1))
y = np.vstack((np.zeros(200), np.ones(200))).reshape((400,1))

然后生成您的图表：

ax.plot(x_test, logistic_regr.predict_proba(x_test)[:,1], '.', label='Logistic regr')

您将获得一个描述预测其中一个类的概率的 S 形图：

【讨论】：

好的，谢谢！似乎在 plot 命令中没有'.'，情节也很混乱！（如果你能帮忙解决这个问题stackoverflow.com/questions/41043348/…。谢谢！
如果在传递给predict_proba() 函数之前调用x_test.sort(axis=0) 对x_test 数组进行排序，您将得到一个平滑的绘图。
嗯，行得通！所以，我猜在绘图之前对事物进行排序是一个好习惯
一个问题..如果我们绘制logistic_regr.predict_proba(x_test) 而不是[:,1]，我们将采用 2 个 sigmoid。这是逻辑回归生成数据的方式吗？谢谢！（赞成）
逻辑回归使用 sigmoid 将我们的输出映射到便于我们解释为概率估计的值范围内。 Sigmoid 只是一个工具，它不是逻辑回归的目的。我们可能并不总是得到正确的 sigmoid 形输出。我的示例是一个非常具体的案例，其中预测一个特定类别的概率从 0 到 1 不等。这使我们能够获得清晰的 sigmoid 形状，但实际示例可能具有非常不同的预测值图。