【问题标题】:Tensorflow Dataset.from_generator fails with pyfunc exceptionTensorflow Dataset.from_generator 因 pyfunc 异常而失败
【发布时间】:2018-03-12 16:29:41
【问题描述】:

我正在尝试 tensorflow 的 nightly 1.4,因为我需要 Dataset.from_generator 将一些可变长度的数据集拼接在一起。这个简单的代码(想法来自here):

import tensorflow as tf

Dataset = tf.contrib.data.Dataset
it2 = Dataset.range(5).make_one_shot_iterator()

def _dataset_generator():
    while True:
        try:
            try:
                get_next = it2.get_next()
                yield get_next
            except tf.errors.OutOfRangeError:
                continue
        except tf.errors.OutOfRangeError:
            return

# Dataset.from_generator need tensorflow > 1.3 !
das_dataset = Dataset.from_generator(_dataset_generator,
                                     output_types=(tf.float32, tf.float32))
das_dataset_it = das_dataset.make_one_shot_iterator()
with tf.Session() as sess:
    while True:
        print(sess.run(it2.get_next()))
        print(sess.run(das_dataset_it.get_next()))

以相当神秘的方式失败:

C:\Dropbox\_\PyCharmVirtual\TF-NIGHTLY\Scripts\python.exe C:/Users/MrD/.PyCharm2017.2/config/scratches/scratch_55.py
0
2017-10-01 12:51:39.773135: W C:\tf_jenkins\home\workspace\tf-nightly-windows\M\windows\PY\35\tensorflow\core\framework\op_kernel.cc:1192] Invalid argument: 0-th value returned by pyfunc_0 is int32, but expects int64
     [[Node: PyFunc = PyFunc[Tin=[], Tout=[DT_INT64], token="pyfunc_0"]()]]
Traceback (most recent call last):
  File "C:\Dropbox\_\PyCharmVirtual\TF-NIGHTLY\lib\site-packages\tensorflow\python\client\session.py", line 1323, in _do_call
    return fn(*args)
  File "C:\Dropbox\_\PyCharmVirtual\TF-NIGHTLY\lib\site-packages\tensorflow\python\client\session.py", line 1302, in _run_fn
    status, run_metadata)
  File "C:\_\Python35\lib\contextlib.py", line 66, in __exit__
    next(self.gen)
  File "C:\Dropbox\_\PyCharmVirtual\TF-NIGHTLY\lib\site-packages\tensorflow\python\framework\errors_impl.py", line 467, in raise_exception_on_not_ok_status
    c_api.TF_GetCode(status.status))
tensorflow.python.framework.errors_impl.InvalidArgumentError: 0-th value returned by pyfunc_0 is int32, but expects int64
     [[Node: PyFunc = PyFunc[Tin=[], Tout=[DT_INT64], token="pyfunc_0"]()]]
     [[Node: IteratorGetNext_1 = IteratorGetNext[output_shapes=[<unknown>, <unknown>], output_types=[DT_FLOAT, DT_FLOAT], _device="/job:localhost/replica:0/task:0/cpu:0"](OneShotIterator_1)]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:/Users/MrD/.PyCharm2017.2/config/scratches/scratch_55.py", line 24, in <module>
    print(sess.run(das_dataset_it.get_next()))
  File "C:\Dropbox\_\PyCharmVirtual\TF-NIGHTLY\lib\site-packages\tensorflow\python\client\session.py", line 889, in run
    run_metadata_ptr)
  File "C:\Dropbox\_\PyCharmVirtual\TF-NIGHTLY\lib\site-packages\tensorflow\python\client\session.py", line 1120, in _run
    feed_dict_tensor, options, run_metadata)
  File "C:\Dropbox\_\PyCharmVirtual\TF-NIGHTLY\lib\site-packages\tensorflow\python\client\session.py", line 1317, in _do_run
    options, run_metadata)
  File "C:\Dropbox\_\PyCharmVirtual\TF-NIGHTLY\lib\site-packages\tensorflow\python\client\session.py", line 1336, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: 0-th value returned by pyfunc_0 is int32, but expects int64
     [[Node: PyFunc = PyFunc[Tin=[], Tout=[DT_INT64], token="pyfunc_0"]()]]
     [[Node: IteratorGetNext_1 = IteratorGetNext[output_shapes=[<unknown>, <unknown>], output_types=[DT_FLOAT, DT_FLOAT], _device="/job:localhost/replica:0/task:0/cpu:0"](OneShotIterator_1)]]

Process finished with exit code 1

请注意,生成器可以正常工作:

with tf.Session() as sess:
    for k in _dataset_generator():
        print(sess.run(k))

打印:

0
1
2
3
4
Traceback (most recent call last):
...
tensorflow.python.framework.errors_impl.OutOfRangeError: End of sequence
     [[Node: IteratorGetNext_5 = IteratorGetNext[output_shapes=[[]], output_types=[DT_INT64], _device="/job:localhost/replica:0/task:0/cpu:0"](OneShotIterator)]]

正如预期的那样。

这是一个错误、缺少的功能还是我严重误解了某些东西?

【问题讨论】:

    标签: python tensorflow generator yield tensorflow-datasets


    【解决方案1】:

    Dataset.from_generator() 方法旨在将非 TensorFlow Python 代码连接到 tf.data 输入管道。例如,您可以从生成器中生成简单的 Python 对象(例如 intstr 对象)、列表或 NumPy 数组,它们将被转换为 TensorFlow 值。

    但是,在您的示例代码中,您将产生 it.get_next() 的结果,这是一个 tf.Tensor 对象。这是不支持的。 如果需要在不同的数据集中捕获迭代器,可以在虚拟数据集上使用Dataset.map(),如下所示:

    import tensorflow as tf
    
    Dataset = tf.contrib.data.Dataset
    it2 = Dataset.range(5).make_one_shot_iterator()
    
    das_dataset = Dataset.from_tensors(0).repeat().map(lambda _: it2.get_next())
    das_dataset_it = das_dataset.make_one_shot_iterator()
    with tf.Session() as sess:
        while True:
            print(sess.run(it2.get_next()))
            print(sess.run(das_dataset_it.get_next()))
    

    【讨论】:

    • 你是一个学者,一个绅士!是的,上面的代码会因相当神秘的.InvalidArgumentError: TypeError: generator` 产生一个对象类型的元素,而 类型的元素是预期的。`,现在是有道理的。感谢您在 windows 上修复它(我在这里发布的错误是 github.com/tensorflow/tensorflow/issues/13101
    • 但是,正如我在这里评论的那样:github.com/tensorflow/tensorflow/issues/… 有办法将数据集拼接在一起会很好 - 我想允许生成器返回张量将是朝着这个方向迈出的一步?我的用例是一个可变长度张量的数据集,然后我想对其进行切片。
    • @Mr_and_Mrs_D: mrry 写了 tf.contrib.data ;)
    猜你喜欢
    • 1970-01-01
    • 2020-05-09
    • 2020-11-14
    • 2014-07-29
    • 1970-01-01
    • 2014-02-06
    • 2015-06-15
    • 1970-01-01
    • 2022-08-13
    相关资源
    最近更新 更多