为什么 LogisticRegression 和 MLPClassifier 不产生相同的结果？答案

【问题标题】：Why don't LogisticRegression and MLPClassifier produce the same results?为什么 LogisticRegression 和 MLPClassifier 不产生相同的结果？
【发布时间】：2021-09-09 04:06:04
【问题描述】：

没有隐藏层和 sigmoid/softmax 激活的神经网络只是逻辑回归：

from sklearn.datasets import load_iris
from sklearn.neural_network import MLPClassifier
from sklearn.linear_model import LogisticRegression
X, y = load_iris(return_X_y=True)
nn = MLPClassifier(hidden_layer_sizes=(), solver = 'lbfgs', activation='logistic', alpha = 0).fit(X,y)
l = LogisticRegression(penalty='none', solver = 'lbfgs',  fit_intercept = False).fit(X,y)

那么为什么这两个模型不产生相同的系数呢？他们中的大多数都很接近，但也有一些差异：

print("NN")
print(nn.coefs_[0].T)
print("\nLogistic")
print(l.coef_)
NN
[[  5.40104629  11.39328515 -16.50698752  -7.86329804]
 [ -1.06741383  -2.48638863   3.37921506  -5.29842503]
 [ -3.55724865  -9.11027371  12.79749019  12.9357708 ]]

Logistic
[[  5.10297361  11.87381176 -16.50600209  -7.70449685]
 [  0.61357365  -2.6277241    4.03442742  -1.28869255]
 [ -5.71654726  -9.24608766  12.47157468   8.9931894 ]]

【问题讨论】：

标签： scikit-learn logistic-regression mlp

【解决方案1】：

您的比较存在一些问题，但纠正它们并不能解决问题；所以，这只是部分答案。

首先，MLP 分类器默认包含一个偏置（截距）节点（其存在与 LR 不同，不可自定义），因此您需要在 LR 中使用fit_intercept = True。

其次，尽管两个模型中的求解器相同，但max_iter 默认值不同，因此我们应该将它们设置为相等。

第三，为了使问题尽可能简单，将讨论保持在二分类设置而不是多分类设置中可以说是一个好主意。

这是您根据上述修改的代码：

from sklearn.datasets import load_iris
from sklearn.neural_network import MLPClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.utils import shuffle

X, y = load_iris(return_X_y=True)

X, y = shuffle(X[:100,], y[:100], random_state=42) # keep only classes 0/1 (binary problem)

nn = MLPClassifier(hidden_layer_sizes=(), solver = 'lbfgs', activation='logistic', alpha = 0, max_iter=100).fit(X,y)
l = LogisticRegression(penalty='none', solver = 'lbfgs',  fit_intercept = True).fit(X,y)

print("NN coefficients & intercept")
print(nn.coefs_[0].T)
print(nn.intercepts_)
print("\nLR coefficients & intercept")
print(l.coef_)
print(l.intercept_)

结果：

NN coefficients & intercept
[[-1.34230329 -4.29615611  7.14868389  2.66752688]]
[array([-0.90035086])]

LR coefficients & intercept
[[-2.07247339 -6.90694692 10.97006745  5.64543091]]
[-1.05932537]

问题是，如果你多次运行上述代码（我没有设置任何随机状态，除了用于数据混洗的那个），你会看到，虽然每次 LR 结果都相同，但 MLP结果因运行而异。这是另一个简短的实验来证明和量化这一点：

nn_coef = []
nn_intercept = []
lr_coef = []
lr_inter = []

for i in range(0,20):
  nn = MLPClassifier(hidden_layer_sizes=(), solver = 'lbfgs', activation='logistic', alpha = 0, max_iter=100).fit(X,y)
  l = LogisticRegression(penalty='none', solver = 'lbfgs',  fit_intercept = True).fit(X,y)

  nn_coef.append(nn.coefs_[0].T)
  nn_intercept.append(nn.intercepts_)
  lr_coef.append(l.coef_)
  lr_inter.append(l.intercept_)

import numpy as np

# get the standard deviations of coefficients & intercepts between runs:

print(np.std(nn_coef, axis=0))
print(np.std(lr_coef, axis=0))
print()
print(np.std(nn_intercept))
print(np.std(lr_inter))

结果：

[[0.14334883 0.42125216 0.46115555 0.4488226 ]]
[[0.00000000e+00 8.88178420e-16 1.77635684e-15 8.88178420e-16]]

0.3393994986547498
0.0

因此，很明显，虽然 LR 系数和截距的标准差几乎为零，但 MLP 参数各自的标准差确实相当大。

似乎 MLP 算法，至少对于 L-BFGS 求解器，对权重和偏差的初始化非常敏感，而 LR 则不然。这似乎也是相关Github thread 中的隐含假设。但我同意你的隐含期望，不应该这样。

如果没有其他人给出令人满意的答案，我想这是打开 Github 问题的一个很好的候选案例。

【讨论】：

【解决方案2】：

正如@desertnaut 所指出的，MLP 初始化似乎确实是问题所在，因为 MLP 和 LR 系数之间的差异似乎随着样本量的增加而减小。

from sklearn.neural_network import MLPClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import StandardScaler
from sklearn.datasets import make_classification

random_state = 100
n_samples = 1000

X, y = make_classification(n_samples=n_samples, n_features=2, n_redundant=0, n_informative=2, n_clusters_per_class=1, random_state=random_state)
X = StandardScaler().fit_transform(X)

nn = MLPClassifier(hidden_layer_sizes=(), solver='lbfgs', activation='logistic', alpha=0, max_iter=1000, tol=0, random_state=random_state).fit(X,y)
lr = LogisticRegression(penalty='none', solver='lbfgs', fit_intercept=True, max_iter=1000, tol=0, random_state=random_state).fit(X,y)

print(nn.intercepts_[0])
print(lr.intercept_)
# [-1.08397244]
# [-1.08397505]

print(nn.coefs_[0].T)
print(lr.coef_)
# [[ 2.90716947 -3.08525711]]
# [[ 2.90718263 -3.08525826]]

下面的代码表明，随着样本量的增加，MLP 系数的方差会减小，并且 MLP 系数和 LR 系数都会收敛到真实系数，即使确切的截止点取决于数据集。

import numpy as np
import pandas as pd
from sklearn.neural_network import MLPClassifier
from sklearn.linear_model import LogisticRegression
import plotly.graph_objects as go
from plotly.subplots import make_subplots

# sample sizes
n_samples = [25, 50, 75, 100, 250, 500, 750, 1000, 5000, 10000]

# number of refits of the MLP and LR
# models for each sample size
n_repetitions = 100

# synthetic data
true_intercept = 10
true_weights = [20, 30]
X = np.random.multivariate_normal(np.zeros(2), np.eye(2), np.max(n_samples))
Z = true_intercept + np.dot(X, true_weights) + np.random.normal(0, 1, np.max(n_samples))
p = 1 / (1 + np.exp(- Z))
y = np.random.binomial(1, p, np.max(n_samples))

# data frame for storing the results for each sample size
output = pd.DataFrame(columns=['sample size', 'label avg.', 'LR intercept avg.', 'LR intercept std.', 'NN intercept avg.',
'NN intercept std.', 'LR first weight avg.', 'LR first weight std.', 'NN first weight avg.', 'NN first weight std.',
'LR second weight avg.', 'LR second weight std.', 'NN second weight avg.', 'NN second weight std.'])

# loop across the different
# sample sizes "n"
for n in n_samples:

    lr_intercept, lr_coef = [], []
    nn_intercept, nn_coef = [], []

    # refit the MLP and LR models multiple times
    # using the first "n" samples
    for k in range(n_repetitions):

        nn = MLPClassifier(hidden_layer_sizes=(), solver='lbfgs', activation='logistic', alpha=0, max_iter=1000, tol=0)
        lr = LogisticRegression(penalty='none', solver='lbfgs', fit_intercept=True, max_iter=1000, tol=0)

        nn.fit(X[:n, :], y[:n])
        lr.fit(X[:n, :], y[:n])

        lr_intercept.append(lr.intercept_)
        nn_intercept.append(nn.intercepts_[0])

        lr_coef.append(lr.coef_)
        nn_coef.append(nn.coefs_[0].T)

    # save the sample mean and sample standard deviations
    # of the MLP and LR estimated coefficients for the
    # considered sample size "n"
    output = output.append(pd.DataFrame({
        'sample size': [n],
        'label avg.': [np.mean(y[:n])],
        'LR intercept avg.': [np.mean(lr_intercept)],
        'LR intercept std.': [np.std(lr_intercept, ddof=1)],
        'NN intercept avg.': [np.mean(nn_intercept)],
        'NN intercept std.': [np.std(nn_intercept, ddof=1)],
        'LR first weight avg.': [np.mean(lr_coef, axis=0)[0][0]],
        'LR first weight std.': [np.std(lr_coef, ddof=1, axis=0)[0][0]],
        'NN first weight avg.': [np.mean(nn_coef, axis=0)[0][0]],
        'NN first weight std.': [np.std(nn_coef, ddof=1, axis=0)[0][0]],
        'LR second weight avg.': [np.mean(lr_coef, axis=0)[0][1]],
        'LR second weight std.': [np.std(lr_coef, ddof=1, axis=0)[0][1]],
        'NN second weight avg.': [np.mean(nn_coef, axis=0)[0][1]],
        'NN second weight std.': [np.std(nn_coef, ddof=1, axis=0)[0][1]],
    }), ignore_index=True)

# plot the results
fig = make_subplots(rows=3, cols=1, subplot_titles=['Intercept', 'First Weight', 'Second Weight'])

fig.add_trace(go.Scatter(
    x=output['sample size'],
    y=[true_intercept] * output.shape[0],
    mode='lines',
    line=dict(color='rgb(82, 188, 163)', dash='dot', width=1),
    legendgroup='True Value',
    name='True Value',
    showlegend=True,
), row=1, col=1)

fig.add_trace(go.Scatter(
    x=output['sample size'],
    y=output['LR intercept avg.'] + output['LR intercept std.'],
    mode='lines',
    line=dict(color='rgba(229, 134, 6, 0.2)'),
    legendgroup='Logistic Regression',
    showlegend=False,
), row=1, col=1)

fig.add_trace(go.Scatter(
    x=output['sample size'],
    y=output['LR intercept avg.'] - output['LR intercept std.'],
    mode='lines',
    fill='tonexty',
    fillcolor='rgba(229, 134, 6, 0.2)',
    line=dict(color='rgba(229, 134, 6, 0.2)'),
    legendgroup='Logistic Regression',
    showlegend=False,
), row=1, col=1)

fig.add_trace(go.Scatter(
    x=output['sample size'],
    y=output['LR intercept avg.'],
    mode='lines',
    line=dict(color='rgb(229, 134, 6)', dash='dot', width=1),
    legendgroup='Logistic Regression',
    name='Logistic Regression',
    showlegend=True,
), row=1, col=1)

fig.add_trace(go.Scatter(
    x=output['sample size'],
    y=output['NN intercept avg.'] + output['NN intercept std.'],
    mode='lines',
    line=dict(color='rgba(93, 105, 177, 0.2)'),
    legendgroup='Logistic Regression',
    showlegend=False,
), row=1, col=1)

fig.add_trace(go.Scatter(
    x=output['sample size'],
    y=output['NN intercept avg.'] - output['NN intercept std.'],
    mode='lines',
    fill='tonexty',
    fillcolor='rgba(93, 105, 177, 0.2)',
    line=dict(color='rgba(93, 105, 177, 0.2)'),
    legendgroup='Logistic Regression',
    showlegend=False,
), row=1, col=1)

fig.add_trace(go.Scatter(
    x=output['sample size'],
    y=output['NN intercept avg.'],
    mode='lines',
    line=dict(color='rgb(93, 105, 177)', dash='dot', width=1),
    legendgroup='MLP Regression',
    name='MLP Regression',
    showlegend=True,
), row=1, col=1)

fig.update_xaxes(
    title='Sample Size',
    type='category',
    mirror=True,
    linecolor='#d9d9d9',
    showgrid=False,
    zeroline=False,
    row=1, col=1
)

fig.update_yaxes(
    title='Estimate',
    mirror=True,
    linecolor='#d9d9d9',
    showgrid=False,
    zeroline=False,
    row=1, col=1
)

fig.add_trace(go.Scatter(
    x=output['sample size'],
    y=[true_weights[0]] * output.shape[0],
    mode='lines',
    line=dict(color='rgb(82, 188, 163)', dash='dot', width=1),
    legendgroup='True Value',
    showlegend=False,
), row=2, col=1)

fig.add_trace(go.Scatter(
    x=output['sample size'],
    y=output['LR first weight avg.'] + output['LR first weight std.'],
    mode='lines',
    line=dict(color='rgba(229, 134, 6, 0.2)'),
    legendgroup='Logistic Regression',
    showlegend=False,
), row=2, col=1)

fig.add_trace(go.Scatter(
    x=output['sample size'],
    y=output['LR first weight avg.'] - output['LR first weight std.'],
    mode='lines',
    fill='tonexty',
    fillcolor='rgba(229, 134, 6, 0.2)',
    line=dict(color='rgba(229, 134, 6, 0.2)'),
    legendgroup='Logistic Regression',
    showlegend=False,
), row=2, col=1)

fig.add_trace(go.Scatter(
    x=output['sample size'],
    y=output['LR first weight avg.'],
    mode='lines',
    line=dict(color='rgb(229, 134, 6)', dash='dot', width=1),
    legendgroup='Logistic Regression',
    showlegend=False,
), row=2, col=1)

fig.add_trace(go.Scatter(
    x=output['sample size'],
    y=output['NN first weight avg.'] + output['NN first weight std.'],
    mode='lines',
    line=dict(color='rgba(93, 105, 177, 0.2)'),
    legendgroup='MLP Regression',
    showlegend=False,
), row=2, col=1)

fig.add_trace(go.Scatter(
    x=output['sample size'],
    y=output['NN first weight avg.'] - output['NN first weight std.'],
    mode='lines',
    fill='tonexty',
    fillcolor='rgba(93, 105, 177, 0.2)',
    line=dict(color='rgba(93, 105, 177, 0.2)'),
    legendgroup='MLP Regression',
    showlegend=False,
), row=2, col=1)

fig.add_trace(go.Scatter(
    x=output['sample size'],
    y=output['NN first weight avg.'],
    mode='lines',
    line=dict(color='rgb(93, 105, 177)', dash='dot', width=1),
    legendgroup='MLP Regression',
    showlegend=False,
), row=2, col=1)

fig.update_xaxes(
    title='Sample Size',
    type='category',
    mirror=True,
    linecolor='#d9d9d9',
    showgrid=False,
    zeroline=False,
    row=2, col=1
)

fig.update_yaxes(
    title='Estimate',
    mirror=True,
    linecolor='#d9d9d9',
    showgrid=False,
    zeroline=False,
    row=2, col=1
)

fig.add_trace(go.Scatter(
    x=output['sample size'],
    y=[true_weights[1]] * output.shape[0],
    mode='lines',
    line=dict(color='rgb(82, 188, 163)', dash='dot', width=1),
    legendgroup='True Value',
    showlegend=False,
), row=3, col=1)

fig.add_trace(go.Scatter(
    x=output['sample size'],
    y=output['LR second weight avg.'] + output['LR second weight std.'],
    mode='lines',
    line=dict(color='rgba(229, 134, 6, 0.2)'),
    legendgroup='Logistic Regression',
    showlegend=False,
), row=3, col=1)

fig.add_trace(go.Scatter(
    x=output['sample size'],
    y=output['LR second weight avg.'] - output['LR second weight std.'],
    mode='lines',
    fill='tonexty',
    fillcolor='rgba(229, 134, 6, 0.2)',
    line=dict(color='rgba(229, 134, 6, 0.2)'),
    legendgroup='Logistic Regression',
    showlegend=False,
), row=3, col=1)

fig.add_trace(go.Scatter(
    x=output['sample size'],
    y=output['LR second weight avg.'],
    mode='lines',
    line=dict(color='rgb(229, 134, 6)', dash='dot', width=1),
    legendgroup='Logistic Regression',
    showlegend=False,
), row=3, col=1)

fig.add_trace(go.Scatter(
    x=output['sample size'],
    y=output['NN second weight avg.'] + output['NN second weight std.'],
    mode='lines',
    line=dict(color='rgba(93, 105, 177, 0.2)'),
    legendgroup='MLP Regression',
    showlegend=False,
), row=3, col=1)

fig.add_trace(go.Scatter(
    x=output['sample size'],
    y=output['NN second weight avg.'] - output['NN second weight std.'],
    mode='lines',
    fill='tonexty',
    fillcolor='rgba(93, 105, 177, 0.2)',
    line=dict(color='rgba(93, 105, 177, 0.2)'),
    legendgroup='MLP Regression',
    showlegend=False,
), row=3, col=1)

fig.add_trace(go.Scatter(
    x=output['sample size'],
    y=output['NN second weight avg.'],
    mode='lines',
    line=dict(color='rgb(93, 105, 177)', dash='dot', width=1),
    legendgroup='MLP Regression',
    showlegend=False,
), row=3, col=1)

fig.update_xaxes(
    title='Sample Size',
    type='category',
    mirror=True,
    linecolor='#d9d9d9',
    showgrid=False,
    zeroline=False,
    row=3, col=1
)

fig.update_yaxes(
    title='Estimate',
    mirror=True,
    linecolor='#d9d9d9',
    showgrid=False,
    zeroline=False,
    row=3, col=1
)

fig.update_layout(
    plot_bgcolor='white',
    paper_bgcolor='white',
    legend=dict(x=0, y=1.125, orientation='h'),
    font=dict(family='Arial', size=6),
    margin=dict(t=40, l=20, r=20, b=20)
)

fig.update_annotations(
    font=dict(family='Arial', size=8)
)

# fig.write_image('LR_MLP_comparison.png', engine='orca', scale=4, height=500, width=400)
fig.write_image('LR_MLP_comparison.png', engine='kaleido', scale=4, height=500, width=400)

【讨论】：

不错的收获；您是否尝试在不固定随机状态的情况下运行多个实验，看看 MLP 结果有多稳定？
我刚刚更新了我的答案，但也有一些例外，因为有些数据集的 MLP 和 LR 系数几乎相同，即使样本量很小，只有 25 - 50 个数据点.