如何从 Keras 提供的 MNIST 数据集中仅选择特定数量的样本？答案

【问题标题】：How do I select only a specific number of samples from the MNIST dataset provided by Keras?如何从 Keras 提供的 MNIST 数据集中仅选择特定数量的样本？
【发布时间】：2017-03-06 04:44:34
【问题描述】：

我目前正在使用 Keras 在 MNIST 数据集上训练卷积神经网络。我正在使用格式加载数据集

(X_train, Y_train), (X_test, Y_test) = mnist.load_data()

但为了减少对所有数据的迭代，我只想为X_train 和Y_train 选择每个类别 0-9 中的前 10000 个样本。我该怎么做？

【问题讨论】：

标签： python deep-learning keras

【解决方案1】：

MNIST 数据集说它返回：

Return:

    2 tuples:
        X_train, X_test: uint8 array of grayscale image data with shape (nb_samples, 28, 28).
        y_train, y_test: uint8 array of digit labels (integers in range 0-9) with shape (nb_samples,).

因此，您只需要对想要保留的部分进行切片。我相信 pandas/numpy 的语法类似于：

X_train = X_train[:10000,:,:]
X_test = X_test[:10000,:,:]
y_train = y_train[:10000]
y_test  = y_test[:10000]

【讨论】：

谢谢！不过，我试图理解 - 是给我前 10000 个样本，还是给我 10 个班级的前 1000 个 ~ 10000
文档说X_train 的形状为(nb_samples, 28, 28)，所以第一个索引是样本，所以:10000 是前10000 个样本。

【解决方案2】：

from keras.datasets import mnist

(x_train, y_train), (x_test, y_test) = mnist.load_data()

x_train = x_train[:1000,:,:]
x_test = x_test[:500,:,:]
y_train = y_train[:1000]
y_test  = y_test[:500]


print(len(x_train))
print(len(y_train))
print(len(x_test))
print(len(y_test))

#输出

【讨论】：