【问题标题】:Matplotlib draw regressor line in 3dMatplotlib 在 3d 中绘制回归线
【发布时间】:2022-01-15 18:08:17
【问题描述】:

所以我试图在 3d 中画一条线,但没有成功。我能够完成适当的分散,就像你在这张照片中看到的那样。

当我实际通过它画一条线时,我收到以下错误:'ValueError: input operand has more dimensions than allowed by the axis remapping'

知道如何解决这个问题吗?

我的代码:

from pathlib import Path

import pandas as pd
from matplotlib import pyplot as plt
from sklearn import linear_model
from sklearn.metrics import r2_score
from sklearn.model_selection import train_test_split

# Path of file
path = Path("data/houses.csv")

df = pd.read_csv(path)


# Assign X and Y axis
X = df[['GarageArea', 'YearBuilt']].apply(pd.to_numeric)
y = df[['SalePrice']].apply(pd.to_numeric)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.4, random_state=1)

model = linear_model.LinearRegression(fit_intercept=1)
model.fit(X_train, y_train)

y_pred = model.predict(X_test)

print(r2_score(y_test, y_pred))
print(X_test['GarageArea'])

fig = plt.figure(figsize=(10, 8))
ax = fig.add_subplot(111, projection='3d')
ax.scatter(X['GarageArea'], X['YearBuilt'], y, c='blue', marker='o')
ax.plot(X_test['GarageArea'], X_test['YearBuilt'], y_pred, color='blue', linewidth=3)
# set your labels
ax.set_xlabel('Garage Area')
ax.set_ylabel('Year Built')
ax.set_zlabel('Price')

plt.show()

【问题讨论】:

    标签: python pandas matplotlib


    【解决方案1】:

    您应该通过上传data/house.csv 提供一个最小的工作示例,并提供错误的完整跟踪堆栈。

    你看到的错误信息应该来自ax.plot,更具体的就行了

    zs = np.broadcast_to(zs, np.shape(xs))
    

    这是mpl_toolkits/mplot3d/axes3d.py的第1570行。

    X_test['GarageArea']X_test['YearBuilt']是两个熊猫系列,也就是说它们是一维数组,形状为(N,)y_predictmodel.predict 的输出。 sklearn.linear_model.LinearRegression is able to train on multiple targets simultaneously,尽管在您的示例中,您只有一个目标,即 SalePrice,因此其输出是形状为 (n_samples, n_targets)(N,1) 的矩阵,类似于[[1],[2],[3],...]

    Axes3D.plot 可以输入 (xs, ys, zs, ...)。出于某种原因,Axes3D.plot 的作者决定 zs 应该是

    • xs 和 ys 形状相同,
    • 或单个值

    因此相同的 z 将与所有 (xs, ys) 相关联,np.broadcast_to 用于处理这两种情况。第一种情况broadcast_to不生效,第二种情况会创建一个xs和ys形状相同的数组,取值为zs。

    在您的代码中,numpy 将尝试将 (N,1) 广播到 (N,)。 (N,1) 表示它是长度为 1 的数组的长度为 N 的数组,例如[[1],[2],[3]]broadcast_torepeat类似,不可能重复(N,1)得到(N,)。

    你可以多尝试几个测试来理解np.broadcast_to的规则:

    no test status
    01 np.broadcast_to(1,np.shape([1,2,3,4,5,6])) ok
    02 np.broadcast_to([1],np.shape([1,2,3,4,5,6])) ok
    03 np.broadcast_to([[1]],np.shape([1,2,3,4,5,6])) error
    04 np.broadcast_to([1,2,3],np.shape([1,2,3,4,5,6])) error
    05 np.broadcast_to([1,2,3,4,5,6],np.shape([1,2,3,4,5,6])) ok
    06 np.broadcast_to([[1,2,3,4,5,6]],np.shape([1,2,3,4,5,6])) error
    07 np.broadcast_to(1,np.shape([[1,2,3],[4,5,6]])) ok
    08 np.broadcast_to([1],np.shape([[1,2,3],[4,5,6]])) ok
    09 np.broadcast_to([[1]],np.shape([[1,2,3],[4,5,6]])) ok
    10 np.broadcast_to([1,2,3],np.shape([[1,2,3],[4,5,6]])) ok
    11 np.broadcast_to([[1,2,3]],np.shape([[1,2,3],[4,5,6]])) ok
    12 np.broadcast_to([[1],[2],[3]],np.shape([[1,2,3],[4,5,6]])) error
    13 np.broadcast_to([1,2],np.shape([[1,2,3],[4,5,6]])) error
    14 np.broadcast_to([1,2,3,4],np.shape([[1,2,3],[4,5,6]])) error

    为了解决您的问题,请将ax.plot 中的y_pred(案例6)更改为y_pred.flatten()(案例5)。

    附上 MWE:

    import numpy as np
    import pandas as pd
    from sklearn import linear_model
    from sklearn.metrics import r2_score
    from sklearn.model_selection import train_test_split
    
    rng = np.random.default_rng()
    N = 2000
    yearBuilt = rng.random(N)*(2020-1860)+1860
    garageArea = rng.random(N)*1000
    salePrice = rng.normal(yearBuilt*1000 + garageArea*500 + (yearBuilt-1860)*garageArea/(2020-1860)/1000*300000,50000,N)
    data = {
        'YearBuilt': yearBuilt,
        'GarageArea': garageArea,
        'SalePrice': salePrice
    }
    df = pd.DataFrame(data)
    
    # Assign X and Y axis
    X = df[['GarageArea', 'YearBuilt']].apply(pd.to_numeric)
    y = df[['SalePrice']].apply(pd.to_numeric)
    
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.4, random_state=1)
    
    model = linear_model.LinearRegression(fit_intercept=1)
    model.fit(X_train, y_train)
    
    y_pred = model.predict(X_test)
    
    fig = plt.figure(figsize=(10, 8))
    ax = fig.add_subplot(111, projection='3d')
    ax.scatter(X['GarageArea'], X['YearBuilt'], y, c='blue', marker='o')
    ax.plot(X_test['GarageArea'], X_test['YearBuilt'], y_pred.flatten(), color='red', linewidth=3)
    # set your labels
    ax.set_xlabel('Garage Area')
    ax.set_ylabel('Year Built')
    ax.set_zlabel('Price')
    
    plt.show()
    

    See output image of the MWE

    假设您在笛卡尔 x-y-z 坐标系中绘制对象。 model.predict 可以被认为是一个映射 (GarageArea, YearBuilt) 另一个数组的数组的函数。 X_test 是一组在 x-y 平面上随机选择和排序的点,model.predict 将计算 z 坐标。你所看到的将是一组随机有序的点,它们像一团纱线一样连接在一起。

    要绘制一条反映模型线性度的线,您需要提供该线在 x-y 平面上的投影作为model.predict 的输入,而不是 X_test。您可以使用 GarageArea 的线性回归作为 YearBuilt 的函数来获得该线。另一种思路是画线性预测平面,或者趋势平面,比趋势线更准确。

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 2014-12-13
      • 2018-07-10
      • 2017-02-05
      • 1970-01-01
      • 2019-03-08
      • 1970-01-01
      • 2019-10-05
      相关资源
      最近更新 更多