【发布时间】:2018-11-06 13:39:35
【问题描述】:
我正在使用 SequenceExample protobuf 将时间序列数据读/写到 TFRecord 文件中。
我序列化了一对np数组如下:
writer = tf.python_io.TFRecordWriter(file_name)
context = tf.train.Features( ... Feature( ... ) ... )
feature_data = tf.train.FeatureList(feature=[
tf.train.Feature(float_list=tf.train.FloatList(value=
np.random.normal(size=([4065000,]))])
labels = tf.train.FeatureList(feature=[
tf.train.Feature(int64_list=tf.train.Int64List(value=
np.random.random_integers(0,10,size=([1084,]))])
##feature_data and labels are of similar, but varying lengths
feature_list = {"feature_data": feature_data,
"labels": labels}
feature_lists = tf.train.FeatureLists(feature_list=feature_list)
example = tf.train.SequenceExample(context=context,
feature_lists=feature_lists)
## serialize and close
在尝试读取 .tfrecords 文件时,我遇到了很多错误,主要是因为 SequenceExample protobuf 将时间序列数据写入为一系列值(例如,值:-12.2549,值:-18.1372,.. ..值:13.1234)。我读取 .tfrecords 文件的代码如下:
dataset = tf.data.TFRecordDataset("data/tf_record.tfrecords")
dataset = dataset.map(decode)
dataset = dataset.make_one_shot_iterator().get_next()
### reshape tensors and feed to estimator###
我的 decode() 函数定义如下:
def decode(serialized_proto):
context_features = {...}
sequence_features = {"feature_data": tf.FixedLenSequenceFeature((None,),
tf.float32),
"labels": tf.FixedLenSequenceFeature(((None,),
tf.int64)}
context, sequence = tf.parse_single_sequence_example(serialized_proto,
context_features=context_features,
sequence_features=sequence_features)
return context, sequence
其中一个错误如下:
Shape [?] is not fully defined for 'ParseSingleSequenceExample/ParseSingleSequenceExample' (op: 'ParseSingleSequenceExample') with input shapes: [], [0], [], [], [], [], [], [], [].
我的主要问题是如何考虑数据集的结构。我不确定我是否真的了解返回数据的结构。我很难遍历这个数据集并返回可变大小的张量。提前致谢!
【问题讨论】:
标签: tensorflow tensorflow-datasets tfrecord