【问题标题】:im2txt: Load input images from memory (instead of read from disk)im2txt:从内存加载输入图像(而不是从磁盘读取)
【发布时间】:2017-04-27 15:33:38
【问题描述】:

我有兴趣修改 the tensorflow implementation of Show and Tell,尤其是 this v0.12 snapshot,以便接受 numpy 形式的图像,而不是从磁盘读取它。

使用上游代码加载文件名会在

之后生成一个 python 字符串
with tf.gfile.GFile(filename, "r") as f:
    image = f.read()

run_inference.py 中,然后变成一个没有形状的ndarray。但是,我无法复制它。

我尝试了以下方法:

直接加载numpy数组

我编写了这个函数来从文件名加载枕头图像,将图像转换为 numpy 数组并将其提供给 run_inference.py 中的 beam_search 函数

def load_image(filename):
    from keras.preprocessing.image import img_to_array
    arr = img_to_array(PILImage.open(filename))
    return arr
...
captions = generator.beam_search(sess, image)

在这种情况下,稍后会出现大小不匹配,导致以下堆栈跟踪:

Traceback (most recent call last):
  File "/home/pmelissi/repos/tensorflow-models/im2txt/bazel-bin/im2txt/run_inference.runfiles/im2txt/im2txt/run_inference.py", line 107, in <module>
    tf.app.run()
  File "/home/pmelissi/miniconda2/envs/im2txt/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 43, in run
    sys.exit(main(sys.argv[:1] + flags_passthrough))
  File "/home/pmelissi/repos/tensorflow-models/im2txt/bazel-bin/im2txt/run_inference.runfiles/im2txt/im2txt/run_inference.py", line 97, in main
    captions = generator.beam_search(sess, image)
  File "/home/pmelissi/repos/tensorflow-models/im2txt/bazel-bin/im2txt/run_inference.runfiles/im2txt/im2txt/inference_utils/caption_generator.py", line 142, in beam_search
    initial_state = self.model.feed_image(sess, encoded_image)
  File "/home/pmelissi/repos/tensorflow-models/im2txt/bazel-bin/im2txt/run_inference.runfiles/im2txt/im2txt/inference_wrapper.py", line 41, in feed_image
    feed_dict={"image_feed:0": encoded_image})
  File "/home/pmelissi/miniconda2/envs/im2txt/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 766, in run
    run_metadata_ptr)
  File "/home/pmelissi/miniconda2/envs/im2txt/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 943, in _run
    % (np_val.shape, subfeed_t.name, str(subfeed_t.get_shape())))
ValueError: Cannot feed value of shape (960, 640, 3) for Tensor u'image_feed:0', which has shape '()'

Process finished with exit code 1

我能否以某种方式欺骗 numpy 使其认为数组没有形状?

转换为 tf.string

这里我使用了以下函数

def encode_image(filename):
    g2 = tf.Graph()
    from keras.preprocessing.image import img_to_array
    with g2.as_default() as g:
        with g.name_scope("g2") as g2_scope:
            arr = img_to_array(PILImage.open(filename))
            image = tf.image.encode_jpeg(arr)
            return image
...
captions = generator.beam_search(sess, image)

这也不起作用:

Traceback (most recent call last):
  File "/home/pmelissi/repos/tensorflow-models/im2txt/bazel-bin/im2txt/run_inference.runfiles/im2txt/im2txt/run_inference.py", line 107, in <module>
    tf.app.run()
  File "/home/pmelissi/miniconda2/envs/im2txt/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 43, in run
    sys.exit(main(sys.argv[:1] + flags_passthrough))
  File "/home/pmelissi/repos/tensorflow-models/im2txt/bazel-bin/im2txt/run_inference.runfiles/im2txt/im2txt/run_inference.py", line 97, in main
    captions = generator.beam_search(sess, image)
  File "/home/pmelissi/repos/tensorflow-models/im2txt/bazel-bin/im2txt/run_inference.runfiles/im2txt/im2txt/inference_utils/caption_generator.py", line 142, in beam_search
    initial_state = self.model.feed_image(sess, encoded_image)
  File "/home/pmelissi/repos/tensorflow-models/im2txt/bazel-bin/im2txt/run_inference.runfiles/im2txt/im2txt/inference_wrapper.py", line 41, in feed_image
    feed_dict={"image_feed:0": encoded_image})
  File "/home/pmelissi/miniconda2/envs/im2txt/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 766, in run
    run_metadata_ptr)
  File "/home/pmelissi/miniconda2/envs/im2txt/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 924, in _run
    raise TypeError('The value of a feed cannot be a tf.Tensor object. '
TypeError: The value of a feed cannot be a tf.Tensor object. Acceptable feed values include Python scalars, strings, lists, or numpy ndarrays.

这个堆栈跟踪的最后一行似乎很有帮助,但是没有关于预期什么样的结构的文档

TypeError: The value of a feed cannot be a tf.Tensor object. Acceptable feed values include Python scalars, strings, lists, or numpy ndarrays.

那么,一个有效的输入应该是什么样的?预处理的内部结构对我来说不是特别清楚。

感谢您的宝贵时间!

编辑:Attached gist of the modified inference script for the big picture

编辑 2: sess.run 的路径是这样的:

1:run_inference.py

captions = generator.beam_search(sess, image)

2:caption_generator.py

def beam_search(self, sess, encoded_image):
    initial_state = self.model.feed_image(sess, encoded_image)

3:inference_wrapper.py

def feed_image(self, sess, encoded_image):
    initial_state = sess.run(fetches="lstm/initial_state:0",
                         feed_dict={"image_feed:0": encoded_image})
    return initial_state

编辑 3:我忘了提到我仅限于 TensorFlow v0.12,因此我使用的是this snapshot of the im2txt repo

【问题讨论】:

  • 将它作为一个 numpy 数组提供是正确的,看起来你没有正确设置模型图中的数组大小(或者它可能在之前和更改没有做现在这一步)。你如何设置图表?当您调用sess.run(...) 时,代码是什么样的?看起来 tensorflow 只是不知道预期的维度。
  • I've uploaded a gist 带有更新的代码。 93-96 行是唯一改变行为的东西。当我保留 93 和 94(原始代码)并注释掉 95 和 96 时,该代码有效,但在任何其他情况下都无效。问题是通常 np_val.shape 和 subfeed_t.get_shape() 都是 ()。谢谢!

标签: python arrays numpy tensorflow


【解决方案1】:

原代码:

with tf.gfile.GFile(filename, "r") as f:
    image = f.read()

将图像作为 python 字符串。

您的代码:

def encode_image(filename):
    g2 = tf.Graph()
    from keras.preprocessing.image import img_to_array
    with g2.as_default() as g:
        with g.name_scope("g2") as g2_scope:
            arr = img_to_array(PILImage.open(filename))
            image = tf.image.encode_jpeg(arr)
            return image

返回一个大小为 () 的 tensorflow.python.framework.ops.Tensor。

大概函数 generator.beam_search(sess, image) 需要一个 python 字符串,而你将它传递给它的大小为 () 的张量。我想修复你的代码的最快方法是做

return image.eval()

而不是

return image

但是,我仍然不知道你为什么要加载一个 jpeg,把它变成一个数组,然后重新编码它再次成为一个 jpeg。

编辑:

如果你真的想把一个 numpy 数组转换成一个 jpeg 二进制的 python 字符串,那么你可以使用这个:

from PIL import Image
import numpy as np
import StringIO


def encode(npdata):
    img = Image.fromarray(npdata)
    output = StringIO.StringIO()
    img.save(output, "jpeg")
    image = output.getvalue()
    output.close()
    return image

npdata = np.random.randint(0,256, (480,640,3)).astype(np.uint8)
print len(encode(npdata))
with open("/tmp/random.jpg", "w") as fp:
    fp.write(encode(npdata)) # Just to prove it actually _is_ working

应该可以将其直接传递到下面的行:

captions = generator.beam_search(sess, encode(mynumpyarray))

另外,这里是原始代码的证明:https://github.com/tensorflow/models/blob/f653bd2340b15ce2a22669ba136b77b2751e462e/im2txt/im2txt/run_inference.py#L72

import tensorflow as tf
def puregfile(filename):
    with tf.gfile.GFile(filename, "r") as f:
        image = f.read()
    return image
print type(puregfile("/tmp/random.jpg"))

输出 "",一个 python 字符串,not 一个 tf.String。但是,我无法完整测试(或不会),因为我不想下载模型和 mscoco 等。

【讨论】:

  • “但是,我仍然不知道你为什么要加载一个 jpeg,把它变成一个数组,然后再将它重新编码为一个 jpeg。”我本可以使用随机的 numpy 数组,但我决定使用与原始代码相同的图像。但是,合并实际的 Web 服务和数据队列将是多余的。因此,首先我尝试从磁盘加载图像作为 numpy 数组并将其提供给 beam_search(由于形状不匹配而失败)。然后我尝试将其转换为张量,这也失败了,但导致了我在第二个堆栈跟踪之后提到的“有用的行”......
猜你喜欢
  • 1970-01-01
  • 2019-06-02
  • 2015-07-19
  • 1970-01-01
  • 2015-08-06
  • 1970-01-01
  • 1970-01-01
  • 2021-12-29
  • 1970-01-01
相关资源
最近更新 更多