使用正态分布对图像进行采样答案

【问题标题】：Using a Normal Distribution to sample images使用正态分布对图像进行采样
【发布时间】：2020-12-13 18:15:10
【问题描述】：

我目前正在使用 keras 和 tensorflow/tensorflow-probability 处理 VAE。我正在使用 mnist 作为训练集。我的问题是对来自 p(x|z) 的输入进行采样。我使用的是正态分布而不是伯努利分布，因为我想稍后在 celeb_a 上训练模型。

我使用的代码基本上是来自this example的代码，只是我用正态分布替换了伯努利分布并改变了一些较小的东西。

我当前的模型如下所示：

prior = tfd.Independent(tfd.Normal(loc=tf.zeros(encoded_size), scale=1), reinterpreted_batch_ndims=1)

inputs = tfk.Input(shape=input_shape)
x = tfkl.Lambda(lambda x: tf.cast(x, tf.float32) - 0.5)(inputs)
x = tfkl.Conv2D(base_depth, 5, strides=1, padding='same', activation=tf.nn.leaky_relu)(x)
x = tfkl.Conv2D(base_depth, 5, strides=2, padding='same', activation=tf.nn.leaky_relu)(x)
x = tfkl.Conv2D(2 * base_depth, 5, strides=1, padding='same', activation=tf.nn.leaky_relu)(x)
x = tfkl.Conv2D(2 * base_depth, 5, strides=2, padding='same', activation=tf.nn.leaky_relu)(x)
x = tfkl.Conv2D(4 * encoded_size, 7, strides=1, padding='valid', activation=tf.nn.leaky_relu)(x)
x = tfkl.Flatten()(x)
x = tfkl.Dense(tfpl.IndependentNormal.params_size(encoded_size))(x)
x = tfpl.IndependentNormal(encoded_size, activity_regularizer=tfpl.KLDivergenceRegularizer(prior))(x)

encoder = tfk.Model(inputs, x, name='encoder')
encoder.summary()

inputs = tfk.Input(shape=(encoded_size,))
x = tfkl.Reshape([1, 1, encoded_size])(inputs)
x = tfkl.Conv2DTranspose(2 * base_depth, 7, strides=1, padding='valid', activation=tf.nn.leaky_relu)(x)
x = tfkl.Conv2DTranspose(2 * base_depth, 5, strides=1, padding='same', activation=tf.nn.leaky_relu)(x)
x = tfkl.Conv2DTranspose(2 * base_depth, 5, strides=2, padding='same', activation=tf.nn.leaky_relu)(x)
x = tfkl.Conv2DTranspose(base_depth, 5, strides=1, padding='same', activation=tf.nn.leaky_relu)(x)
x = tfkl.Conv2DTranspose(base_depth, 5, strides=2, padding='same', activation=tf.nn.leaky_relu)(x)
x = tfkl.Conv2DTranspose(base_depth, 5, strides=1, padding='same', activation=tf.nn.leaky_relu)(x)
mu = tfkl.Conv2D(filters=1, kernel_size=5, strides=1, padding='same', activation=None)(x)
mu = tfkl.Flatten()(mu)
sigma = tfkl.Conv2D(filters=1, kernel_size=5, strides=1, padding='same', activation=None)(x)
sigma = tf.exp(sigma)
sigma = tfkl.Flatten()(sigma)
x = tf.concat((mu, sigma), axis=1)
x = tfkl.LeakyReLU()(x)
x = tfpl.IndependentNormal(input_shape)(x)

decoder = tfk.Model(inputs, x)
decoder.summary()

negloglik = lambda x, rv_x: -rv_x.log_prob(x)

vae.compile(optimizer=tf.optimizers.Adam(learning_rate=1e-4),
            loss=negloglik)

## mnist_digits are normed between 0.0 and 1.0
history = vae.fit(mnist_digits, mnist_digits, epochs=100, batch_size=300)

当使用伯努利分布时，一切正常，损失稳步减少，从返回的分布中采样的图像看起来像教程中的图像。但是当使用正态分布时，损失上限在 470 左右，返回分布中的样本只不过是噪声。有人可以帮我改进模型吗？只是虚弱吗？如果有人知道解决方案，他能否解释一下背后的原因以及他分析问题的方式？

【问题讨论】：

标签： tensorflow machine-learning keras neural-network tensorflow-probability

【解决方案1】：

我也在研究如何使用正态分布使代码工作。您的代码的一个明显缺陷是，正态分布的 negloglik 损失将是 MSE 损失，而不是 -rv_x.log_prob(x)。但是即使使用了MSE loss，结果也不好。对于任何输入，重建的图像都是相同的。它是所有数字 (0-9) 的混合。 Inputs and their reconstructions

【讨论】：

实际上它对我有用。想看的可以看一下：repo