如何在没有 train_test_split() 的情况下拆分数据集？答案

【问题标题】：How to split the data set without train_test_split()?如何在没有 train_test_split() 的情况下拆分数据集？
【发布时间】：2018-08-09 19:22:32
【问题描述】：

我需要将数据集拆分为训练和测试。我需要最后 20% 的值用于测试，前 80% 用于训练。我目前使用了“train_test_split()”，但它随机选择数据而不是最后 20%。我怎样才能得到最后 20% 的测试和前 80% 的训练？我的代码如下：

numpy_array = df.as_matrix()
X = numpy_array[:, 1:26]
y = numpy_array[:, 0]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=20) #I do not want the data to be random.

谢谢

【问题讨论】：

How to get a non-shuffled train_test_split in sklearn的可能重复

标签： python arrays numpy scikit-learn

【解决方案1】：

train_pct_index = int(0.8 * len(X))
X_train, X_test = X[:train_pct_index], X[train_pct_index:]
y_train, y_test = y[:train_pct_index], y[train_pct_index:]

这是最好不要让sklearn 助手参与的情况之一。非常简单、易读，并且不依赖于知道 sklearn 助手的内部选项，代码读者可能没有经验。

【讨论】：

【解决方案2】：

我认为这个 Stackoverflow 主题回答了你的问题：

How to get a non-shuffled train_test_split in sklearn

尤其是这段文字：

在 scikit-learn 0.19 版本中，您可以将参数 shuffle=False 传递给 train_test_split 以获得非随机拆分。

来自文档：

shuffle : 布尔值，可选（默认=True）

拆分前是否打乱数据。如果 shuffle=False 则 >stratify 必须为 None。

如果我没有正确理解您的问题，请告诉我

【讨论】：