【问题标题】：im2txt: Load input images from memory (instead of read from disk)im2txt：从内存加载输入图像（而不是从磁盘读取）
【发布时间】：2017-04-27 15:33:38
【问题描述】：

我有兴趣修改 the tensorflow implementation of Show and Tell，尤其是 this v0.12 snapshot，以便接受 numpy 形式的图像，而不是从磁盘读取它。

使用上游代码加载文件名会在

之后生成一个 python 字符串

with tf.gfile.GFile(filename, "r") as f:
    image = f.read()

在run_inference.py 中，然后变成一个没有形状的ndarray。但是，我无法复制它。

我尝试了以下方法：

直接加载numpy数组

我编写了这个函数来从文件名加载枕头图像，将图像转换为 numpy 数组并将其提供给 run_inference.py 中的 beam_search 函数

def load_image(filename):
    from keras.preprocessing.image import img_to_array
    arr = img_to_array(PILImage.open(filename))
    return arr
...
captions = generator.beam_search(sess, image)

在这种情况下，稍后会出现大小不匹配，导致以下堆栈跟踪：

Traceback (most recent call last):
  File "/home/pmelissi/repos/tensorflow-models/im2txt/bazel-bin/im2txt/run_inference.runfiles/im2txt/im2txt/run_inference.py", line 107, in <module>
    tf.app.run()
  File "/home/pmelissi/miniconda2/envs/im2txt/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 43, in run
    sys.exit(main(sys.argv[:1] + flags_passthrough))
  File "/home/pmelissi/repos/tensorflow-models/im2txt/bazel-bin/im2txt/run_inference.runfiles/im2txt/im2txt/run_inference.py", line 97, in main
    captions = generator.beam_search(sess, image)
  File "/home/pmelissi/repos/tensorflow-models/im2txt/bazel-bin/im2txt/run_inference.runfiles/im2txt/im2txt/inference_utils/caption_generator.py", line 142, in beam_search
    initial_state = self.model.feed_image(sess, encoded_image)
  File "/home/pmelissi/repos/tensorflow-models/im2txt/bazel-bin/im2txt/run_inference.runfiles/im2txt/im2txt/inference_wrapper.py", line 41, in feed_image
    feed_dict={"image_feed:0": encoded_image})
  File "/home/pmelissi/miniconda2/envs/im2txt/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 766, in run
    run_metadata_ptr)
  File "/home/pmelissi/miniconda2/envs/im2txt/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 943, in _run
    % (np_val.shape, subfeed_t.name, str(subfeed_t.get_shape())))
ValueError: Cannot feed value of shape (960, 640, 3) for Tensor u'image_feed:0', which has shape '()'

Process finished with exit code 1

我能否以某种方式欺骗 numpy 使其认为数组没有形状？

转换为 tf.string

这里我使用了以下函数

def encode_image(filename):
    g2 = tf.Graph()
    from keras.preprocessing.image import img_to_array
    with g2.as_default() as g:
        with g.name_scope("g2") as g2_scope:
            arr = img_to_array(PILImage.open(filename))
            image = tf.image.encode_jpeg(arr)
            return image
...
captions = generator.beam_search(sess, image)

这也不起作用：

Traceback (most recent call last):
  File "/home/pmelissi/repos/tensorflow-models/im2txt/bazel-bin/im2txt/run_inference.runfiles/im2txt/im2txt/run_inference.py", line 107, in <module>
    tf.app.run()
  File "/home/pmelissi/miniconda2/envs/im2txt/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 43, in run
    sys.exit(main(sys.argv[:1] + flags_passthrough))
  File "/home/pmelissi/repos/tensorflow-models/im2txt/bazel-bin/im2txt/run_inference.runfiles/im2txt/im2txt/run_inference.py", line 97, in main
    captions = generator.beam_search(sess, image)
  File "/home/pmelissi/repos/tensorflow-models/im2txt/bazel-bin/im2txt/run_inference.runfiles/im2txt/im2txt/inference_utils/caption_generator.py", line 142, in beam_search
    initial_state = self.model.feed_image(sess, encoded_image)
  File "/home/pmelissi/repos/tensorflow-models/im2txt/bazel-bin/im2txt/run_inference.runfiles/im2txt/im2txt/inference_wrapper.py", line 41, in feed_image
    feed_dict={"image_feed:0": encoded_image})
  File "/home/pmelissi/miniconda2/envs/im2txt/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 766, in run
    run_metadata_ptr)
  File "/home/pmelissi/miniconda2/envs/im2txt/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 924, in _run
    raise TypeError('The value of a feed cannot be a tf.Tensor object. '
TypeError: The value of a feed cannot be a tf.Tensor object. Acceptable feed values include Python scalars, strings, lists, or numpy ndarrays.

这个堆栈跟踪的最后一行似乎很有帮助，但是没有关于预期什么样的结构的文档

TypeError: The value of a feed cannot be a tf.Tensor object. Acceptable feed values include Python scalars, strings, lists, or numpy ndarrays.

那么，一个有效的输入应该是什么样的？预处理的内部结构对我来说不是特别清楚。

感谢您的宝贵时间！

编辑：Attached gist of the modified inference script for the big picture

编辑 2： sess.run 的路径是这样的：

1：run_inference.py

captions = generator.beam_search(sess, image)

2：caption_generator.py

def beam_search(self, sess, encoded_image):
    initial_state = self.model.feed_image(sess, encoded_image)

3：inference_wrapper.py

def feed_image(self, sess, encoded_image):
    initial_state = sess.run(fetches="lstm/initial_state:0",
                         feed_dict={"image_feed:0": encoded_image})
    return initial_state

编辑 3：我忘了提到我仅限于 TensorFlow v0.12，因此我使用的是this snapshot of the im2txt repo。

【问题讨论】：

将它作为一个 numpy 数组提供是正确的，看起来你没有正确设置模型图中的数组大小（或者它可能在之前和更改没有做现在这一步）。你如何设置图表？当您调用sess.run(...) 时，代码是什么样的？看起来 tensorflow 只是不知道预期的维度。
I've uploaded a gist 带有更新的代码。 93-96 行是唯一改变行为的东西。当我保留 93 和 94（原始代码）并注释掉 95 和 96 时，该代码有效，但在任何其他情况下都无效。问题是通常 np_val.shape 和 subfeed_t.get_shape() 都是 ()。谢谢！

标签： python arrays numpy tensorflow

【解决方案1】：

原代码：

with tf.gfile.GFile(filename, "r") as f:
    image = f.read()

将图像作为 python 字符串。

您的代码：

def encode_image(filename):
    g2 = tf.Graph()
    from keras.preprocessing.image import img_to_array
    with g2.as_default() as g:
        with g.name_scope("g2") as g2_scope:
            arr = img_to_array(PILImage.open(filename))
            image = tf.image.encode_jpeg(arr)
            return image

返回一个大小为 () 的 tensorflow.python.framework.ops.Tensor。

大概函数 generator.beam_search(sess, image) 需要一个 python 字符串，而你将它传递给它的大小为 () 的张量。我想修复你的代码的最快方法是做

return image.eval()

而不是

return image

但是，我仍然不知道你为什么要加载一个 jpeg，把它变成一个数组，然后重新编码它再次成为一个 jpeg。

编辑：

如果你真的想把一个 numpy 数组转换成一个 jpeg 二进制的 python 字符串，那么你可以使用这个：

from PIL import Image
import numpy as np
import StringIO


def encode(npdata):
    img = Image.fromarray(npdata)
    output = StringIO.StringIO()
    img.save(output, "jpeg")
    image = output.getvalue()
    output.close()
    return image

npdata = np.random.randint(0,256, (480,640,3)).astype(np.uint8)
print len(encode(npdata))
with open("/tmp/random.jpg", "w") as fp:
    fp.write(encode(npdata)) # Just to prove it actually _is_ working

您应该可以将其直接传递到下面的行：

captions = generator.beam_search(sess, encode(mynumpyarray))

另外，这里是原始代码的证明：https://github.com/tensorflow/models/blob/f653bd2340b15ce2a22669ba136b77b2751e462e/im2txt/im2txt/run_inference.py#L72

import tensorflow as tf
def puregfile(filename):
    with tf.gfile.GFile(filename, "r") as f:
        image = f.read()
    return image
print type(puregfile("/tmp/random.jpg"))

输出 ""，一个 python 字符串，not 一个 tf.String。但是，我无法完整测试（或不会），因为我不想下载模型和 mscoco 等。

【讨论】：

“但是，我仍然不知道你为什么要加载一个 jpeg，把它变成一个数组，然后再将它重新编码为一个 jpeg。”我本可以使用随机的 numpy 数组，但我决定使用与原始代码相同的图像。但是，合并实际的 Web 服务和数据队列将是多余的。因此，首先我尝试从磁盘加载图像作为 numpy 数组并将其提供给 beam_search（由于形状不匹配而失败）。然后我尝试将其转换为张量，这也失败了，但导致了我在第二个堆栈跟踪之后提到的“有用的行”......