使用 keras::tfrecord_dataset 读取多个 TFRecord 文件答案

【问题标题】：Read multiple TFRecord files with keras::tfrecord_dataset使用 keras::tfrecord_dataset 读取多个 TFRecord 文件
【发布时间】：2019-08-17 09:05:15
【问题描述】：

我开始使用 Keras，并希望从多个 TFRecord 文件创建数据集。

下面是我到目前为止的代码。

 dataset <- tfrecord_dataset(filenames) %>%
  dataset_map(function(example_proto) {
    features <- list(
      label = tf$FixedLenFeature(shape(), tf$float32),
      a = tf$FixedLenFeature(shape(), tf$float32),
      b = tf$FixedLenFeature(shape(), tf$float32),
      c = tf$FixedLenFeature(shape(), tf$float32),
      d = tf$FixedLenFeature(shape(), tf$float32)

 )

    features <- tf$parse_single_example(example_proto, features)
    x <- list(features$a,
              features$b, 
              features$c
              features$d
    )
    y <- tf$one_hot(tf$cast(features$label, tf$int32), 3L)
    list(x, y)
  }) %>%
  dataset_shuffle(150) %>%
  dataset_batch(16)

使用单个 tfrecord（例如具有功能 a、b 和 c 的 file1），一切正常。

如果我添加第二个具有功能 d 的 tfrecord 文件 2，使得文件名 = c(file1，file2)，那么我会收到以下错误：

Error in py_call_impl(callable, dots$args, dots$keywords) : 
InvalidArgumentError: Feature: d (data type: float) is required but could not be found.

有没有一种使用 tfrecord_dataset 读取多个 tfrecord 文件的有效方法？

谢谢..！

【问题讨论】：

标签： r tensorflow keras tensorflow-datasets

【解决方案1】：

至少从 TensorFlow 1.14 开始，您可以将参数 default_value 传递给 tf$io$FixedLenFeature，该参数将在文件中缺少该列的情况下使用。

来自文档：

default_value：示例缺少此功能时使用的值。它必须与dtype 兼容并具有指定的形状。

【讨论】：