抽取图像数据集的样本[关闭]

【问题标题】：Taking a sample of the image dataset [closed]抽取图像数据集的样本[关闭]
【发布时间】：2021-06-06 18:12:30
【问题描述】：

例如，我想开发一个用于图像分类的深度学习模型，我有数千张图像。由于用整个数据集训练模型需要很长时间，我想从原始数据集中抽取一个样本（10%）进行初始训练。如何做到这一点？

【问题讨论】：

请搁置假设性讨论（“例如”），并准确描述您的确切问题。跨度>

标签： machine-learning deep-learning computer-vision

【解决方案1】：

如果数据集包含在文件夹中，我将尝试以下操作：

import os 
import numpy as np

images = os.listdir('Path to your dataset') # list of all the images
n_test_images = int(len(images) * 0.1) # 10% of the total images

subset_images = np.random.choice(images, size=n_test_images, replace=False)

我使用 replace=True 来避免选择两次相同的元素。在我选择了 10% 的图像后，我加载它们。

实际上我不确定这种方式是否是最佳方式，但它可能是一个很好的起点。

【讨论】：