Tensorflow JS GAN 模型从不学习答案

【问题标题】：Tensorflow JS GAN model never learnsTensorflow JS GAN 模型从不学习
【发布时间】：2020-03-19 18:03:52
【问题描述】：

我正在尝试将 Python DCGAN MNIST 代码实验室示例 (https://www.tensorflow.org/tutorials/generative/dcgan) 移植到 Tensorflow.js。生成器模型应该能够创建类似于 MNIST 样本数据的手写数字图像。

我的代码运行没有错误，但我面临两个主要问题。

训练过程比 Python 示例慢得多。例如，浏览器中的 JS 与在 Google 代码实验室中运行 Python 示例。
我的生成器模型永远无法真正生成手写数字。

它学习到生成网格状图像的程度，但似乎从来没有学到太多东西。

我相信这些模型是 1:1 端口。这是我的模型。

// discriminator model
let dModel = tf.sequential();
const IMAGE_WIDTH = 28;
const IMAGE_HEIGHT = 28;
const IMAGE_CHANNELS = 1;

dModel.add(
tf.layers.conv2d({inputShape: [IMAGE_WIDTH, IMAGE_HEIGHT, IMAGE_CHANNELS], kernelSize: [5,5], filters: 64, strides: [2,2], activation: "relu",
    kernelInitializer: "varianceScaling"
  })
);

dModel.add(tf.layers.leakyReLU())
dModel.add(tf.layers.dropout(0.3))

dModel.add(
tf.layers.conv2d({kernelSize: [5,5], filters: 128, strides: [2,2],
    activation: "relu", kernelInitializer: "varianceScaling"
  })
);

dModel.add(tf.layers.leakyReLU())
dModel.add(tf.layers.dropout(0.3))
dModel.add(tf.layers.flatten());

const NUM_OUTPUT_CLASSES = 1;
dModel.add(tf.layers.dense({units: NUM_OUTPUT_CLASSES}))

// generator model
let gModel = tf.sequential();
gModel.add(tf.layers.dense({units: 7 * 7 * 256,inputShape: [100], useBias: false}));
gModel.add(tf.layers.batchNormalization());
gModel.add(tf.layers.leakyReLU());

gModel.add(tf.layers.reshape({ targetShape: [7, 7, 256] }));

gModel.add(tf.layers.conv2dTranspose({filters: 128, kernelSize: [5, 5], strides: [1, 1], useBias: false, padding: "same"}));
gModel.add(tf.layers.batchNormalization());
gModel.add(tf.layers.leakyReLU());

gModel.add(tf.layers.conv2dTranspose({filters: 64, kernelSize: [5, 5], strides: [2, 2], useBias: false,padding: "same" }));
gModel.add(tf.layers.batchNormalization());
gModel.add(tf.layers.leakyReLU());

gModel.add(tf.layers.conv2dTranspose({filters: 1,kernelSize: [5, 5], strides: [2, 2], useBias: false,padding: "same", activation: "tanh" }));

损失函数是我找不到与 Gradient Tape 等效的 JS 的地方，所以我对它们进行了一些不同的设计。

Python 示例使用：

cross_entropy = tf.keras.losses.BinaryCrossentropy(from_logits=True)

def discriminator_loss(real_output, fake_output):
    real_loss = cross_entropy(tf.ones_like(real_output), real_output)
    fake_loss = cross_entropy(tf.zeros_like(fake_output), fake_output)
    total_loss = real_loss + fake_loss
    return total_loss

def generator_loss(fake_output):
    return cross_entropy(tf.ones_like(fake_output), fake_output)

def train_step(images):
    noise = tf.random.normal([BATCH_SIZE, noise_dim])

    with tf.GradientTape() as gen_tape, tf.GradientTape() as disc_tape:
      generated_images = generator(noise, training=True)

      real_output = discriminator(images, training=True)
      fake_output = discriminator(generated_images, training=True)

      gen_loss = generator_loss(fake_output)
      disc_loss = discriminator_loss(real_output, fake_output)

    gradients_of_generator = gen_tape.gradient(gen_loss, generator.trainable_variables)
    gradients_of_discriminator = disc_tape.gradient(disc_loss, discriminator.trainable_variables)

    generator_optimizer.apply_gradients(zip(gradients_of_generator, generator.trainable_variables))
    discriminator_optimizer.apply_gradients(zip(gradients_of_discriminator, discriminator.trainable_variables))

我在哪里使用了 optimizer.minimize。我不确定这是否可能过度训练鉴别器并导致问题。

即使在损失函数中重复调用 model.predict，我还是这样做了，否则我收到错误 Please make sure the operations that use variables are inside the function f passed to minimize()

function trainStep() {
  const noise = tf.randomNormal([BATCH_SIZE, 100])
  const fakeLabels = tf.ones([BATCH_SIZE], 'int32')
  const realLabels = tf.zeros([BATCH_SIZE], 'int32')

  const dLossCalc = () => {
    const fakeImages = gModel.predict(noise).add(1).div(2)
    let realImages = data.nextTrainBatch(BATCH_SIZE).xs
    realImages = realImages.reshape([BATCH_SIZE, IMAGE_WIDTH, IMAGE_HEIGHT, 1])
    realImages = realImages.sub(127.5).div(127.5)     //normalize to 1, -1

    const fakeLogits = dModel.predict(fakeImages).reshape([BATCH_SIZE])
    const realLogits = dModel.predict(realImages).reshape([BATCH_SIZE])

    const fakeLoss = tf.losses.sigmoidCrossEntropy(fakeLabels.mul(0.98), fakeLogits)
    const realLoss = tf.losses.sigmoidCrossEntropy(realLabels, realLogits)
    const totalLoss = fakeLoss.add(realLoss)
    console.log('Disc Loss ' + totalLoss.dataSync())
    return totalLoss
  }

  const gLossCalc = () => {
    const fakeImages = gModel.predict(noise).add(1).div(2)
    const logits = dModel.predict(fakeImages).reshape([BATCH_SIZE])
    const loss = tf.losses.sigmoidCrossEntropy(fakeLabels, logits) 
    console.log('Gen Loss ' + loss.dataSync())
    return loss
  }

  dOptimizer.minimize(dLossCalc)
  gOptimizer.minimize(gLossCalc)
}

在这一点上，我已经花了几个小时，希望能得到任何帮助。

我找不到等效的两个主要内容是 Gradient Tape / Apply Gradients 和 tf.keras.losses.BinaryCrossentropy 损失函数。我正在使用 sigmoidCrossEntropy。

如果有人愿意看一下，这是一个完整的 codepen 示例： https://codepen.io/freeman-g/pen/KKpRyyX?editors=0010

作为旁注，我注意到 Tensorflow.js API 文档中没有记录 applyGradients 并打开了相关的 GitHub 问题：https://github.com/tensorflow/tfjs/issues/2897

【问题讨论】：

标签： tensorflow.js

【解决方案1】：

执行速度可能会降低，以确保您的浏览器能够正确访问 GPU - 您可能需要安装一堆 nVidia CUDA 驱动程序。我在 Windows 上，浏览器/驱动程序设置相当复杂。这是值得的，即使是我的廉价 GT 1030 GPU 也比使用我的 CPU 提速 5-6 倍。

二进制交叉熵在tf.metrics.binaryCrossEntropy 中可用，而不是tf.keras.losses.BinaryCrossentropy

看起来因为with tf.GradientTape() 使用的python 语法，不太可能有相同的语法。根据我的阅读，看起来with tf.GradientTape() 基本上是在创建一个匿名函数，该函数被 GradientTape 对象调用了很多次，以便实际确定渐变。

与 GradientTape 等效的 tfjs 似乎是 tf.grads（有用：tfjs-core/dist/ops/conv2d_transpose_test.js 有一堆测试，包括相同任务的 python 和 JS 版本）。你传入一个匿名函数，然后在你想确定梯度时调用它。

这是将 python tf.GradientTape() 文档示例移植到 JS 中的代码 - 它得到相同的结果。注意：我不是 python 开发者，所以我可能误解了一些 python 的东西。

  // python example from https://www.tensorflow.org/api_docs/python/tf/GradientTape:
  //     x = tf.constant(3.0)
  //     with tf.GradientTape() as g:
  //         g.watch(x)
  //         y = x * x
  //     dy_dx = g.gradient(y, x)   <-- gets slope of y per x
  //     print(dy_dx)               <-- prints "tf.Tensor(6.0, shape=(), dtype=float32)"


  // tfjs version:
  // myGradFunction is the equivalent of the inside of the `with` section - it squares the incoming Tensor
  // args is the array of Tensors I pass in when I call getGrad later.
  const myGradFunction = (...args:Tensor[]):Tensor => {
    const x = args[0];
    return x.mul(x);
  } 

  // this is basically the same as "with tf.GradientTape()" - it builds a manager and 
  // feeds it the anonymous function it will eventually use to determine gradient.
  // the "as any" is because tfjs's typescript doesn't appear to be perfect or I'm dumb.
  const getGrad = tf.grads(myGradFunction as any);

  const xs = [
    tf.tensor(3, [1]), // tf.constant(3.0) more or less
  ];
  const gradients = getGrad(xs); // this will trigger a bunch of calls to myGradFunction

  // spit out the result
  gradients.forEach((grad) => grad.print());

我正在自己做 JS 中的 GAN 示例，如果我管理它，将使用 github 链接更新这篇文章。

【讨论】：