Matplotlib 在 3d 中绘制回归线答案

【问题标题】：Matplotlib draw regressor line in 3dMatplotlib 在 3d 中绘制回归线
【发布时间】：2022-01-15 18:08:17
【问题描述】：

所以我试图在 3d 中画一条线，但没有成功。我能够完成适当的分散，就像你在这张照片中看到的那样。

当我实际通过它画一条线时，我收到以下错误：'ValueError: input operand has more dimensions than allowed by the axis remapping'

知道如何解决这个问题吗？

我的代码：

from pathlib import Path

import pandas as pd
from matplotlib import pyplot as plt
from sklearn import linear_model
from sklearn.metrics import r2_score
from sklearn.model_selection import train_test_split

# Path of file
path = Path("data/houses.csv")

df = pd.read_csv(path)


# Assign X and Y axis
X = df[['GarageArea', 'YearBuilt']].apply(pd.to_numeric)
y = df[['SalePrice']].apply(pd.to_numeric)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.4, random_state=1)

model = linear_model.LinearRegression(fit_intercept=1)
model.fit(X_train, y_train)

y_pred = model.predict(X_test)

print(r2_score(y_test, y_pred))
print(X_test['GarageArea'])

fig = plt.figure(figsize=(10, 8))
ax = fig.add_subplot(111, projection='3d')
ax.scatter(X['GarageArea'], X['YearBuilt'], y, c='blue', marker='o')
ax.plot(X_test['GarageArea'], X_test['YearBuilt'], y_pred, color='blue', linewidth=3)
# set your labels
ax.set_xlabel('Garage Area')
ax.set_ylabel('Year Built')
ax.set_zlabel('Price')

plt.show()

【问题讨论】：

标签： python pandas matplotlib

【解决方案1】：

您应该通过上传data/house.csv 提供一个最小的工作示例，并提供错误的完整跟踪堆栈。

你看到的错误信息应该来自ax.plot，更具体的就行了

zs = np.broadcast_to(zs, np.shape(xs))

这是mpl_toolkits/mplot3d/axes3d.py的第1570行。

X_test['GarageArea']和X_test['YearBuilt']是两个熊猫系列，也就是说它们是一维数组，形状为(N,)。 y_predict 是 model.predict 的输出。 sklearn.linear_model.LinearRegression is able to train on multiple targets simultaneously，尽管在您的示例中，您只有一个目标，即 SalePrice，因此其输出是形状为 (n_samples, n_targets) 或 (N,1) 的矩阵，类似于[[1],[2],[3],...]。

Axes3D.plot 可以输入 (xs, ys, zs, ...)。出于某种原因，Axes3D.plot 的作者决定 zs 应该是

xs 和 ys 形状相同，
或单个值

因此相同的 z 将与所有 (xs, ys) 相关联，np.broadcast_to 用于处理这两种情况。第一种情况broadcast_to不生效，第二种情况会创建一个xs和ys形状相同的数组，取值为zs。

在您的代码中，numpy 将尝试将 (N,1) 广播到 (N,)。 (N,1) 表示它是长度为 1 的数组的长度为 N 的数组，例如[[1],[2],[3]]。 broadcast_to与repeat类似，不可能重复(N,1)得到(N,)。

你可以多尝试几个测试来理解np.broadcast_to的规则：

no	test	status
01	`np.broadcast_to(1,np.shape([1,2,3,4,5,6]))`	ok
02	`np.broadcast_to([1],np.shape([1,2,3,4,5,6]))`	ok
03	`np.broadcast_to([[1]],np.shape([1,2,3,4,5,6]))`	error
04	`np.broadcast_to([1,2,3],np.shape([1,2,3,4,5,6]))`	error
05	`np.broadcast_to([1,2,3,4,5,6],np.shape([1,2,3,4,5,6]))`	ok
06	`np.broadcast_to([[1,2,3,4,5,6]],np.shape([1,2,3,4,5,6]))`	error
07	`np.broadcast_to(1,np.shape([[1,2,3],[4,5,6]]))`	ok
08	`np.broadcast_to([1],np.shape([[1,2,3],[4,5,6]]))`	ok
09	`np.broadcast_to([[1]],np.shape([[1,2,3],[4,5,6]]))`	ok
10	`np.broadcast_to([1,2,3],np.shape([[1,2,3],[4,5,6]]))`	ok
11	`np.broadcast_to([[1,2,3]],np.shape([[1,2,3],[4,5,6]]))`	ok
12	`np.broadcast_to([[1],[2],[3]],np.shape([[1,2,3],[4,5,6]]))`	error
13	`np.broadcast_to([1,2],np.shape([[1,2,3],[4,5,6]]))`	error
14	`np.broadcast_to([1,2,3,4],np.shape([[1,2,3],[4,5,6]]))`	error

为了解决您的问题，请将ax.plot 中的y_pred（案例6）更改为y_pred.flatten()（案例5）。

附上 MWE：

import numpy as np
import pandas as pd
from sklearn import linear_model
from sklearn.metrics import r2_score
from sklearn.model_selection import train_test_split

rng = np.random.default_rng()
N = 2000
yearBuilt = rng.random(N)*(2020-1860)+1860
garageArea = rng.random(N)*1000
salePrice = rng.normal(yearBuilt*1000 + garageArea*500 + (yearBuilt-1860)*garageArea/(2020-1860)/1000*300000,50000,N)
data = {
    'YearBuilt': yearBuilt,
    'GarageArea': garageArea,
    'SalePrice': salePrice
}
df = pd.DataFrame(data)

# Assign X and Y axis
X = df[['GarageArea', 'YearBuilt']].apply(pd.to_numeric)
y = df[['SalePrice']].apply(pd.to_numeric)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.4, random_state=1)

model = linear_model.LinearRegression(fit_intercept=1)
model.fit(X_train, y_train)

y_pred = model.predict(X_test)

fig = plt.figure(figsize=(10, 8))
ax = fig.add_subplot(111, projection='3d')
ax.scatter(X['GarageArea'], X['YearBuilt'], y, c='blue', marker='o')
ax.plot(X_test['GarageArea'], X_test['YearBuilt'], y_pred.flatten(), color='red', linewidth=3)
# set your labels
ax.set_xlabel('Garage Area')
ax.set_ylabel('Year Built')
ax.set_zlabel('Price')

plt.show()

See output image of the MWE

假设您在笛卡尔 x-y-z 坐标系中绘制对象。 model.predict 可以被认为是一个映射 (GarageArea, YearBuilt) 另一个数组的数组的函数。 X_test 是一组在 x-y 平面上随机选择和排序的点，model.predict 将计算 z 坐标。你所看到的将是一组随机有序的点，它们像一团纱线一样连接在一起。

要绘制一条反映模型线性度的线，您需要提供该线在 x-y 平面上的投影作为model.predict 的输入，而不是 X_test。您可以使用 GarageArea 的线性回归作为 YearBuilt 的函数来获得该线。另一种思路是画线性预测平面，或者趋势平面，比趋势线更准确。

【讨论】：