线性回归图没有给我有意义的可视化答案

【问题标题】：Linear regression plot not giving me meaningful visualization线性回归图没有给我有意义的可视化
【发布时间】：2020-11-12 15:59:34
【问题描述】：

我正在使用一些时间序列的功耗数据并尝试对其进行线性回归分析。

数据有以下列：

日期，Denmark_consumption，Germany_consumption，Czech_consumption，Austria_consumption。

它是以小时为频率的时间序列数据。

然而，每一列的值都是 NaN 的我的目标是创建一个线性回归模型对没有空值的数据子集进行训练和测试，然后尝试预测丹麦消费列的值，例如当前具有 NaN 值的值。

我计划使用我的训练/测试功能一个国家的消费列以及序数值的日期来尝试预测第二个国家的消费值。

这是一个数据示例。

Date                   Denmark    Germany    Czech   Austria

2018-01-01 00:00:00     1607.0  42303.0     5520    6234.0

2018-01-01 01:00:00     1566.0  41108.0     5495    6060.0

2018-01-01 02:00:00     1460.0  40554.0     5461    5872.0

2018-01-01 03:00:00     1424.0  38533.0     5302    5564.0

2018-01-01 04:00:00     1380.0  38494.0     5258    5331.0

我做了几件事。

我删除了任何空值的行来创建我的训练和测试数据集。
我将日期列设置为数据框索引。
我将数据从每小时上采样到每周。我用默认的 'mean' 聚合函数。
我将日期作为一列添加到训练和测试数据中，并将其转换为序数值。
因为各种消费值都是高度相关的，所以我只对X_train和X_test数据集使用了德国消费列

我使用 sklearn 创建了一个线性回归模型，并使用德国消费和有序日期作为我的“X”和丹麦消费作为我的“Y”来拟合数据。

我试图通过散点图和线绘制输出，但得到的图形如下所示：

为什么我的情节看起来像是有人在上面乱涂乱画？我期待的是某种单行。

这是我的 x_train 数据集的示例

                        Germany    Date
                      consumption
Date                                   
2018-07-08         44394.125000  736883
2019-01-16         66148.125000  737075
2019-08-03         45718.083333  737274
2019-06-09         41955.250000  737219
2020-03-04         61843.958333  737488

这是我的 y_train 数据集的示例。

Date
2018-01-01    1511.083333
2018-01-02    1698.625000
2018-01-03    1781.291667
2018-01-04    1793.458333
2018-01-05    1796.875000
Name: Denmark_consumption, dtype: float64

这是实际的相关代码。

lin_model = LinearRegression()
lin_model.fit(X_train,y_train)
y_pred = lin_model.predict(X_test)
plt.scatter(X_test['Date'].map(dt.datetime.fromordinal),y_pred,color='black')
plt.plot(X_test['Date'],y_pred)

系数、R 平方和均方误差分别为：

Coefficients: 
 [0.01941453 0.01574128]
Mean squared error: 14735.12
Coefficient of determination: 0.51

有人可以告诉我我做错了什么吗？另外，我的方法准确吗？尝试有意义吗并根据第二个国家的消费 + 日期的组合来预测消费值？

任何帮助表示赞赏。

【问题讨论】：

X_test 排序了吗？

标签： python pandas linear-regression sklearn-pandas

【解决方案1】：

您的方法很复杂，但可行。就我个人而言，我认为在德国的日期和德国的消费之间创建一个线性映射可能更容易，然后尝试从他们的日期对丹麦的消费做出预测。

但是，坚持您的方法，您应该记住有两个自变量（德国的日期转换为整数，以及德国的消费量），而丹麦的消费量取决于这两个变量。因此，通过像现在这样在 2D 图中根据预测绘制测试日期，您实际上错过了消耗变量。您应该在 3D 平面上绘制的是德国的日期，以及德国的消费量与丹麦的消费量。

您也不应该期望得到一条线：使用多元线性回归和两个自变量，您正在预测一个平面。

这是我整理的一个简短示例，与您可能想要实现的目标相似。根据需要随意更改日期的格式。

import pandas as pd
import numpy as np
import datetime as dt
from mpl_toolkits.mplot3d import *
import matplotlib.pyplot as plt
from matplotlib import cm
from sklearn.linear_model import LinearRegression

from pandas.plotting import register_matplotlib_converters
register_matplotlib_converters()

# starts 2018/11/02
df_germany = pd.DataFrame({
    'Germany consumption': [45000, 47000, 48000, 42000, 50000],
    'Date': [737000, 737001, 737002, 737003, 737004]})
df_germany_test = pd.DataFrame({
    'Germany consumption': [42050, 42000, 57000, 30000, 52000, 53000],
    'Date': [737000, 737001, 737002, 737003, 737004, 737005]})
df_denmark = pd.DataFrame({
    'Denmark consumption':  [1500, 1600, 1700, 1800, 2000]
    })

X_train = df_germany.to_numpy()
y_train = df_denmark['Denmark consumption']

# make X_test the same as X_train to make sure all points are on the plane
# X_test = df_germany

# make X_test slightly different
X_test = df_germany_test

lin_model = LinearRegression()
lin_model.fit(X_train,y_train)
y_pred = lin_model.predict(X_test)

fig = plt.figure()
ax = fig.gca(projection='3d')          
# plt.hold(True)

x_surf=np.linspace(min(X_test['Date'].values), max(X_test['Date'].values), num=20)               
y_surf=np.linspace(min(X_test['Germany consumption'].values), max(X_test['Germany consumption'].values), num=20)
x_surf, y_surf = np.meshgrid(x_surf, y_surf)
b0 = lin_model.intercept_
b1, b2 = lin_model.coef_ 
z_surf = b0+ b2*x_surf + b1*y_surf
ax.plot_surface(x_surf, y_surf, z_surf, cmap=cm.cool, alpha = 0.2)    # plot a 3d surface plot

ax.scatter(X_test['Date'].values, X_test['Germany consumption'].values, y_pred, alpha=1.0)
plt.show()

【讨论】：

非常感谢您。这真的很有帮助。
我很高兴听到这个消息！如果有帮助，请考虑accepting the answer，以便其他有类似问题的人知道从哪里开始:)