ValueError : x 和 y 的大小必须相同答案

【问题标题】：ValueError : x and y must be the same sizeValueError : x 和 y 的大小必须相同
【发布时间】：2020-02-23 16:11:35
【问题描述】：

我有一个数据集，我正在尝试使用 sklearn 计算线性回归。我使用的数据集已经制作好了，所以应该不会有问题。我使用 train_test_split 将我的数据分成训练组和测试组。当我尝试使用 matplotlib 在我的测试组和预测组之间创建散点图时，出现下一个错误：

ValueError: x 和 y 的大小必须相同

这是我的代码：

y=data['Yearly Amount Spent']
x=data[['Avg. Session Length','Time on App','Time on Website','Length of Membership','Yearly Amount Spent']]
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.3, random_state=101)

#training the model

from sklearn.linear_model import LinearRegression
lm=LinearRegression()
lm.fit(x_train,y_train)
lm.coef_

predictions=lm.predict(X_test)

#here the problem starts:

plt.scatter(y_test,predictions)

为什么会出现这个错误？我在这里看过以前的帖子，对此的建议是使用 x.shape 和 y.shape 但我不确定这样做的目的是什么。

谢谢

【问题讨论】：

你传递给lm.predict的变量是大写的，而你在顶部分配的那个不是。您确定将正确的数据传递给 lm.predict？
通过调用len(y_tests) 和len(predictions) 检查y_test 和predictions 的大小。他们可能不匹配。还要确保您预测的是正确的变量，因为 x_test 和 X_test 是不同的东西。
发生这种情况是因为y.shape[0] 和x.shape[0] 不相等

标签： python matplotlib scikit-learn linear-regression

【解决方案1】：

您似乎正在使用EcommerceCustomers.csv 数据集（link here）

在您的原始帖子中，'Yearly Amount Spent' 列也包含在 y 和 x 中，但这是错误的。

以下应该可以正常工作：

from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split

data = pd.read_csv("EcommerceCustomers.csv")

y = data['Yearly Amount Spent']
X = data[['Avg. Session Length', 'Time on App','Time on Website', 'Length of Membership']]

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=101)


# ## Training the Model
lm = LinearRegression()
lm.fit(X_train,y_train)

# The coefficients
print('Coefficients: \n', lm.coef_)

# ## Predicting Test Data
predictions = lm.predict( X_test)

另见this

【讨论】：