具有许多图像的 TensorFlow TFRecord 在读取期间崩溃答案

【问题标题】：TensorFlow TFRecord with many images crashes during read具有许多图像的 TensorFlow TFRecord 在读取期间崩溃
【发布时间】：2017-07-12 14:24:33
【问题描述】：

我在读取包含“许多”（超过 ~500 个）事件的 TFRecord 文件时遇到问题。如果我创建一个包含 500 个事件的文件，一切都很好，但是当我尝试读取和解析文件时，超过 500 个会导致错误：

W tensorflow/core/framework/op_kernel.cc:993] Invalid argument: Could not parse example input, value:
...
UnicodeDecodeError: 'utf8' codec can't decode byte 0xb5 in position 40: invalid start byte

图像是形状为(N, 2, 127, 50) 的浮点数（在读取过程中重新整形为(N, 127, 50, 2)）。我尝试以两种不同的方式编写它们：作为字节列表和作为浮点列表，两者都以相同的方式失败。

对于“字节法”，代码的业务部分是：

def write_to_tfrecord(data_dict, tfrecord_file):
    writer = tf.python_io.TFRecordWriter(tfrecord_file)
    features_dict = {}
    for k in data_dict.keys():
        features_dict[k] = tf.train.Feature(
            bytes_list=tf.train.BytesList(value=[data_dict[k]['byte_data']])
        )
    example = tf.train.Example(
        features=tf.train.Features(feature=features_dict)
    )
    writer.write(example.SerializeToString())
    writer.close()

然后阅读：

def tfrecord_to_graph_ops_xtxutuvtv(filenames):
    def process_hitimes(inp, shape):
        hitimes = tf.decode_raw(inp, tf.float32)
        hitimes = tf.reshape(hitimes, shape)
        hitimes = tf.transpose(hitimes, [0, 2, 3, 1])
        return hitimes

    file_queue = tf.train.string_input_producer(filenames, name='file_queue')
    reader = tf.TFRecordReader()
    _, tfrecord = reader.read(file_queue)

    tfrecord_features = tf.parse_single_example(
        tfrecord,
        features={
            'hitimes-x': tf.FixedLenFeature([], tf.string),
        },
        name='data'
    )
    hitimesx = proces_hitimes(
        tfrecord_features['hitimes-x'], [-1, 2, 127, 50]
    )
    return hitimesx

（通常，我也会读写其他张量，但问题在于只有一个。）

对于“浮动方法”，代码如下：

def write_to_tfrecord(data_dict, tfrecord_file):
    writer = tf.python_io.TFRecordWriter(tfrecord_file)
    features_dict = {}
    features_dict['hitimes-x'] = tf.train.Feature(
        float_list=tf.train.FloatList(
            value=data_dict['hitimes-x']['data'].flatten()
        )
    )
    example = tf.train.Example(
        features=tf.train.Features(feature=features_dict)
    )
    writer.write(example.SerializeToString())
    writer.close()

阅读时：

def tfrecord_to_graph_ops_xtxutuvtv(filenames):
    def process_hitimes(inp, shape):
        hitimes = tf.sparse_tensor_to_dense(inp)
        hitimes = tf.reshape(hitimes, shape)
        hitimes = tf.transpose(hitimes, [0, 2, 3, 1])
        return hitimes

    file_queue = tf.train.string_input_producer(filenames, name='file_queue')
    reader = tf.TFRecordReader()
    _, tfrecord = reader.read(file_queue)

    tfrecord_features = tf.parse_single_example(
        tfrecord,
        features={
            'hitimes-x': tf.VarLenFeature(tf.float32),
        },
        name='data'
    )
    hitimesx = process_hitimes(
        tfrecord_features['hitimes-x'], [-1, 2, 127, 50]
    )
    return hitimesx

正在写入的数据是 float32 类型的 NumPy ndarray。

我很想这是一个错误（我使用的是 TensorFlow 1.0），因为这两种方法都适用于最多约 500 张图像，但当我尝试使用更多图像时会中断。我查看了文档，看看是否有我应该添加的参数，以便读者和作者可以处理更大的文件，但我没有找到任何东西（此外，500 张图片并不多 - 我需要写 10 张数百万）。

有什么想法吗？我计划今天尝试使用 TensorFlow 1.2，但还没有机会。

【问题讨论】：

我非常怀疑这与事件的数量有关。我正在使用 tfrecord 文件，每个文件都有 1000 万个事件，一切都很好。我建议您拍摄一张图像并将其保存 1k 次，以查看它与数字 500 无关。然后找到哪张图像破坏了您的阅读器，看看它与您已有的有什么不同。
这不是事件 500 - 我试过了。我认为这是 TF 1.0 中的一个错误。

标签： python image-processing tensorflow deep-learning

【解决方案1】：

我升级到 TF 1.2.1 并且上述问题消失了（至少在使用 ByteLists 时 - 我不确定哪种方法更符合 TensorFlow 的习惯，但将所有内容都视为 ByteList 和字节数据是在这里对我来说更简单的代码）。

我认为新问题发生在读取一个大文件时（现在，我可以在 TF 记录文件中写入超过 25k 个事件，也许更多） - 即TF一次打开整个文件并将其全部加载到内存中，这超出了我的数据处理测试机可以处理的范围，但我不直接将这归咎于TensorFlow（虽然我需要想出某种方便的压缩或分块方案等）。

【讨论】：