ssim 作为自动编码器中的自定义损失函数（keras 或/和 tensorflow）答案

【问题标题】：ssim as custom loss function in autoencoder (keras or/and tensorflow)ssim 作为自动编码器中的自定义损失函数（keras 或/和 tensorflow）
【发布时间】：2018-12-12 19:14:17
【问题描述】：

我目前正在编写用于图像压缩的自动编码器。通过previous post，我现在最终确认我不能在 Keras 和 tensorflow 中使用纯 Python 函数作为损失函数。（我慢慢开始明白为什么了;-)

我想使用ssim 作为损失函数和度量来做一些实验。现在看来我可能是幸运的。 tensorflow 中已经有它的实现，见：https://www.tensorflow.org/api_docs/python/tf/image/ssim

tf.image.ssim( 图像1，图像2， max_val )

此外，bsautermeister 在 stackoverflow 上提供了一个实现：SSIM / MS-SSIM for TensorFlow。

我现在的问题是：我将如何使用 mnist 数据集作为损失函数？该函数不接受张量，但只接受两个图像。而且，梯度会自动计算吗？据我了解，如果该功能是在 tensorflow 或 keras 后端实现的。

非常感谢提供一个最小工作示例 (MWE)，说明如何在 keras 或 tensorflow 中使用任何前面提到的 ssim 实现作为损失函数。

也许我们可以将我的 MWE 用于我之前的问题提供的自动编码器： keras custom loss pure python (without keras backend)

如果无法将我的 keras 自动编码器与 ssim 实现粘合在一起，是否可以使用直接在 tensorflow 中实现的自动编码器？我也有，可以提供吗？

我正在使用 python 3.5、keras（带有 tensorflow 后端），如果需要，可以直接使用 tensorflow。目前我正在使用mnist dataset（带有数字的那个）。

感谢您的帮助！

（P.S.：似乎有几个人在做类似的事情。对这篇文章的回答可能对Keras - MS-SSIM as loss function 也有用）

【问题讨论】：

标签： python tensorflow keras autoencoder

【解决方案1】：

我不能使用 Keras，但在普通的 TensorFlow 中，您只需切换 L2 或使用 SSIM 结果的任何成本，例如

import tensorflow as tf
import numpy as np


def fake_img_batch(*shape):
    i = np.random.randn(*shape).astype(np.float32)
    i[i < 0] = -i[i < 0]
    return tf.convert_to_tensor(np.clip(i * 255, 0, 255))


fake_img_a = tf.get_variable('a', initializer=fake_img_batch(2, 224, 224, 3))
fake_img_b = tf.get_variable('b', initializer=fake_img_batch(2, 224, 224, 3))

fake_img_a = tf.nn.sigmoid(fake_img_a)
fake_img_b = tf.nn.sigmoid(fake_img_b)

# costs = tf.losses.mean_squared_error(fake_img_a, fake_img_b, reduction=tf.losses.Reduction.MEAN)
costs = tf.image.ssim(fake_img_a, fake_img_b, 1.)
costs = tf.reduce_mean(costs)

train = tf.train.AdamOptimizer(0.01).minimize(costs)

with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    print(sess.run(costs))
    for k in range(500):
        _, l = sess.run([train, costs])
        if k % 100 == 0:
            print('mean SSIM', l)

检查一个操作是否有梯度（实现）是直截了当的：

import tensorflow as tf
import numpy as np


def fake_img_batch(*shape):
    i = np.random.randn(*shape).astype(np.float32)
    i[i < 0] = -i[i < 0]
    return tf.convert_to_tensor(np.clip(i * 255, 0, 255))

x1 = tf.convert_to_tensor(fake_img_batch(2, 28, 28, 3))
x2 = tf.convert_to_tensor(fake_img_batch(2, 28, 28, 3))


y1 = tf.argmax(x1)  # not differentiable -> no gradients
y2 = tf.image.ssim(x1, x2, 255) # has gradients

with tf.Session() as sess:
    print(tf.gradients(y1, [x1]))  # will print [None] --> no gradient
    print(tf.gradients(y2, [x1, x2]))  # will print [<tf.Tensor 'gradients ...>, ...] --> has gradient

【讨论】：

首先，非常感谢。这已经非常有帮助了。你为我澄清了梯度问题。
很遗憾，还有一些我不太明白的地方。使用 mnist=input_data.read_data_sets("MNIST_data", one_hot=True) 和 print(mnist.train.num_examples,mnist.test.num_examples,mnist.validation.num_examples) 我得到 (55000,784) (55000,10) 对于您正在使用具有 (2, 224, 224, 3) (2, 224, 224, 3) 的适当张量的假图像。但是 tf.image.ssim 只是为图像设计的吗？我在这里想念什么？我无法让代码与 mnist 一起使用。你能澄清一下维度是如何结合在一起的吗？
您可以仅将 SSIM 应用于两个图像，因此您必须将 784 输入重塑为 [28, 28, 1]。请做not forget
谢谢！我会再尝试。我会在 MWE 开始工作后立即发布它。
这里tf.reduce_mean()行的作用是什么？原始costs 是否给出了一个批次的 ssims 张量，而下一行将它们平均为单个批次的平均 ssim 值？