【发布时间】:2019-10-17 05:07:33
【问题描述】:
我正在使用带有 numpy 数组的 sklearn。 我有 2 个数组 (x, y),它们应该是:
test_size=0.2
train_size=0.8
这是我当前的代码:
def predict():
sample_data = pd.read_csv("includes\\csv.csv")
x = np.array(sample_data["day"])
y = np.array(sample_data["balance"])
x = x.reshape(1, -1)
y = y.reshape(1, -1)
print(x)
print(y)
X_train, X_test, y_train, y_test = train_test_split(x, y, test_size=0.2)
clf = LinearRegression()
clf.fit(x_train, y_train)
clf.score(x_test, y_test)
错误是:
ValueError: With n_samples=1, test_size=0.2 and train_size=None, the resulting train set will be empty. Adjust any of the aforementioned parameters.
,它出现在以下行:
X_train, X_test, y_train, y_test = train_test_split(x, y, test_size=0.2)
任何想法为什么会出现?
【问题讨论】:
-
你试过了吗:
train_test_split(x, y, test_size=0.2, train_size=0.8)? -
它沿第一个轴拆分,在您的样本中大小为 1。查看
sklearn文档以了解其输入形状的约定 - 样本数与特征数。跨度> -
是的,我试过它显示相同的错误
-
如何防止这种情况发生@hpaulj
-
您可以尝试:
x = sample_data["day"].values y = sample_data["balance"].values并删除 reshape 命令
标签: python pandas numpy sklearn-pandas