python.data.ops.dataset_ops.BatchDataset - 如何使用它来创建训练和测试数据集答案

【问题标题】：python.data.ops.dataset_ops.BatchDataset - how to use it to create training and test datasetspython.data.ops.dataset_ops.BatchDataset - 如何使用它来创建训练和测试数据集
【发布时间】：2020-09-24 10:37:28
【问题描述】：

使用 TensorFlow 遍历目录并拍摄我想用于训练 NN 的图像。

train_ds = tf.keras.preprocessing.image_dataset_from_directory(
    wk_dir,
    labels="inferred",
    label_mode="int",
    class_names=None,
    color_mode="grayscale",
    batch_size=batches,
    image_size=image_dim,
    shuffle=True,
    seed=1968,
    validation_split=0.2,
    subset="training",
    interpolation="bilinear",
    follow_links=False,
)

找到属于 3 个类的 127561 个文件。使用 102049 文件进行训练。

结果 - 它有效....现在我正在尝试使用它来输入模型，但不知道如何管理它...

print(train_ds)
<BatchDataset shapes: ((None, 576, 432, None), (None,)), types: (tf.float32, tf.int32)>

那么我在数组中是否有 2 个元素，一个有 4 个元素，其中 2 个是空的，第二个元素是他的分类？
我试图拆分 BatchDatashape 并得到错误TypeError: 'BatchDataset' object is not subscriptable

如何操作对象类型为 python.data.ops.dataset_ops.BatchDataset 的 TF？

【问题讨论】：

标签： python tensorflow split

【解决方案1】：

如果你想看看这个 BatchDataset 长什么样，你可以试试：
```
print(list(train_ds.as_numpy_iterator()))
```
更多关于 TensorFlow Data 和 BatchDataset：https://www.tensorflow.org/guide/data#batching_dataset_elements
似乎没有足够的信息告诉你如何准确地构建模型，但我可以推荐这门课程，看看如何使用 BatchDataset 作为模型输入来构建模型：https://www.coursera.org/projects/fine-tune-bert-tensorflow

【讨论】：

【解决方案2】：

您有一个数据集名称 -train_ds。如果你想要验证数据集，你需要再写一条语句，区别在于子集名称：

train_ds = tf.keras.preprocessing.image_dataset_from_directory(
    wk_dir,
    labels="inferred",
    label_mode="int",
    class_names=None,
    color_mode="grayscale",
    batch_size=batches,
    image_size=image_dim,
    shuffle=True,
    seed=1968,
    validation_split=0.2,
    subset="validation",
    interpolation="bilinear",
    follow_links=False,
)

【讨论】：