【问题标题】:ValueError: x and y must be the same size (Linear regression)ValueError: x 和 y 的大小必须相同(线性回归)
【发布时间】:2022-01-17 16:42:04
【问题描述】:

所以,我正在尝试可视化我的线性模型回归。但是,当我尝试运行它时,它给了我一个 valueError。 我尝试了不同的解决方案,并查看了其他具有相同问题的主题。

df = pd.read_csv('housingmonthly.csv', sep=',')

X = df[['date', 'area', 'code','houses_sold', 'no_of_crimes']]
y = df['average_price']

X = pd.get_dummies(df[['date', 'area', 'code', 'houses_sold', 'no_of_crimes']])

X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)

print("Xtrain", X_train.shape, "y_train", 
      y_train.shape, "Xtest", X_test.shape, "y_test", y_test.shape)  


regr = linear_model.LinearRegression()

lr = LinearRegression()
lr.fit(X_train,y_train)
print("Score on training set: {:.3f}".format(lr.score(X_train, y_train)))
print("Score on test set: {:.3f}".format(lr.score(X_test, y_test)))

regr.fit(X_train, y_train)

y_pred = regr.predict(X_test)

plt.scatter(X_test, y_test, color="black")
plt.plot(X_test, y_pred, color="blue", linewidth=3)
plt.xticks(())
plt.yticks(())

plt.show()

堆栈跟踪:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
/var/folders/tl/80zdv_rx5sv1t7d5dgz86bzc0000gn/T/ipykernel_29101/3394670003.py in <module>
     15 print("Coefficient of determination: %.2f" % r2_score(y_test, y_pred))
         16 
    ---> 17 plt.scatter(X_test, y_test, color="black")
         18 plt.plot(X_test, y_pred, color="blue", linewidth=3)
         19 plt.Xticks(())

/opt/anaconda3/lib/python3.8/site-packages/matplotlib/pyplot.py in scatter(x, y, s, c, marker, cmap, norm, vmin, vmax, alpha, linewidths, verts, edgecolors, plotnonfinite, data, **kwargs)
   2888         verts=cbook.deprecation._deprecated_parameter,
   2889         edgecolors=None, *, plotnonfinite=False, data=None, **kwargs):
-> 2890     __ret = gca().scatter(
   2891         x, y, s=s, c=c, marker=marker, cmap=cmap, norm=norm,
   2892         vmin=vmin, vmax=vmax, alpha=alpha, linewidths=linewidths,

/opt/anaconda3/lib/python3.8/site-packages/matplotlib/__init__.py in inner(ax, data, *args, **kwargs)
   1445     def inner(ax, *args, data=None, **kwargs):
   1446         if data is None:
-> 1447             return func(ax, *map(sanitize_sequence, args), **kwargs)
   1448 
   1449         bound = new_sig.bind(ax, *args, **kwargs)

/opt/anaconda3/lib/python3.8/site-packages/matplotlib/cbook/deprecation.py in wrapper(*inner_args, **inner_kwargs)
    409                          else deprecation_addendum,
    410                 **kwargs)
--> 411         return func(*inner_args, **inner_kwargs)
    412 
    413     return wrapper

/opt/anaconda3/lib/python3.8/site-packages/matplotlib/axes/_axes.py in scatter(self, x, y, s, c, marker, cmap, norm, vmin, vmax, alpha, linewidths, verts, edgecolors, plotnonfinite, **kwargs)
   4439         y = np.ma.ravel(y)
   4440         if x.size != y.size:
-> 4441             raise ValueError("x and y must be the same size")
   4442 
   4443         if s is None:

这是错误代码。我现在不知道,我真的应该如何解决这个问题。

非常感谢

【问题讨论】:

  • 你能显示错误的完整回溯吗?此外,您似乎很清楚 X 和 y 之间的维度存在问题。打印它们和它们的尺寸。最后,您使用的是 x 还是 X。也许这是错误,因为您复制的错误出现 x。
  • 我刚刚添加了错误的回溯。我正在使用 X 而不是 x。正如你所说,我很确定它是 X 和 y 之间的维度,但我不知道我需要使用哪个代码来重塑它。非常感谢。
  • 嗯,这与scikit-learn 无关。这是一个matplotlib问题。 plt.scatter 用于绘制二维数据。所以它期望 x 和 1 轴的值。

标签: python matplotlib


【解决方案1】:

我使用住房竞争数据重复了你的代码(只是为了有一个工作示例。这里是我的代码(我注释了你的代码中不符合我的数据的行)

df = pd.read_csv('data/train.csv')

#X = df[['date', 'area', 'code','houses_sold', 'no_of_crimes']]
#y = df['average_price']
X = df[['GarageType', 'Alley', 'LotShape']]
y = df['SalePrice']

X = pd.get_dummies(df[['GarageType', 'Alley', 'LotShape']])

X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)

print("Xtrain", X_train.shape, "y_train", 
      y_train.shape, "Xtest", X_test.shape, "y_test", y_test.shape)  


#regr = linear_model.LinearRegression()
regr = LinearRegression()


lr = LinearRegression()
lr.fit(X_train,y_train)
print("Score on training set: {:.3f}".format(lr.score(X_train, y_train)))
print("Score on test set: {:.3f}".format(lr.score(X_test, y_test)))

regr.fit(X_train, y_train)

y_pred = regr.predict(X_test)

plt.scatter(X_test, y_test, color="black")
plt.plot(X_test, y_pred, color="blue", linewidth=3)
plt.xticks(())
plt.yticks(())

plt.show()

如果我检查我得到的形状

In [6]: X_test.shape
Out[6]: (365, 12)

In [7]: y_test.shape
Out[7]: (365,)

这显然不一样。 X_test 和 y_test 都需要一维。我猜你想选择一列,像这样:

plt.scatter(X_test[X_test.columns[0]], y_test, color="black")

【讨论】:

    猜你喜欢
    • 2018-12-14
    • 2020-02-23
    • 2017-05-30
    • 2014-08-25
    • 2021-07-02
    • 1970-01-01
    • 1970-01-01
    • 2021-08-17
    • 2020-05-06
    相关资源
    最近更新 更多