如何在 Tensorflow 2.0 中计算输出 wrt 输入的梯度答案

【问题标题】：How to compute gradient of output wrt input in Tensorflow 2.0如何在 Tensorflow 2.0 中计算输出 wrt 输入的梯度
【发布时间】：2020-03-27 10:30:35
【问题描述】：

我有一个训练有素的 Tensorflow 2.0 模型（来自 tf.keras.Sequential()），它采用具有 26 列 (X) 的输入层并生成具有 1 列 (Y) 的输出层。

在 TF 1.x 中，我能够使用以下方法计算输出相对于输入的梯度：

model = load_model('mymodel.h5')
sess = K.get_session()
grad_func = tf.gradients(model.output, model.input)
gradients = sess.run(grad_func, feed_dict={model.input: X})[0]

在 TF2 中，当我尝试运行 tf.gradients() 时，出现错误：

RuntimeError: 启用急切执行时不支持 tf.gradients。请改用 tf.GradientTape。

在问题In TensorFlow 2.0 with eager-execution, how to compute the gradients of a network output wrt a specific layer? 中，我们看到了关于如何计算相对于中间层的梯度的答案，但我看不到如何将其应用于相对于输入的梯度。在 tf.GradientTape 的 Tensorflow help 上，有计算简单函数梯度的示例，但不包括神经网络。

如何使用 tf.GradientTape 计算输出相对于输入的梯度？

【问题讨论】：

这能回答你的问题吗？ In TensorFlow 2.0 with eager-execution, how to compute the gradients of a network output wrt a specific layer?
@LongNguyen - 不，它没有。我已经在我的问题中链接到该答案，并解释了为什么它没有回答。
这应该可以回答您的问题。但这是一个非常简单的功能。但是，cmets 应该指导您如何将其适应神经网络。如果您仍然有问题，请告诉我，以便我可以根据您的问题编辑答案。 stackoverflow.com/questions/35226428/…
@thushv89 - 感谢您的链接。我已经浏览了那个例子，但没有成功地尝试使它适应我的问题。我试过： tf.GradientTape() as tape: preds = model(model.input) dy_dx = tape.gradient(preds,tf.convert_to_tensor(X)) 但这给出了错误：“tensorflow.python.framework. errors_impl.InvalidArgumentError：尺寸必须相等，但对于输入形状为 [2]、[0] 的“Equal_1”（操作：“Equal”），尺寸必须为 2 和 0。”
我会看看并在这里发布答案

标签： python-3.x tensorflow neural-network tensorflow2.0

【解决方案1】：

我希望这是您正在寻找的。这将给出输出 w.r.t 的梯度。输入。

# Whatever the input you like goes in as the initial_value
x = tf.Variable(np.random.normal(size=(25, 120)), dtype=tf.float32)
y_true = np.random.choice([0,1], size=(25,10))

print(model.output)
print(model.predict(x))
with tf.GradientTape() as tape:
  pred = model.predict(x)

grads = tape.gradients(pred, x)

【讨论】：

感谢您的帖子。这是TF2的吗？当我尝试此操作（更改 x 和 y_true 以匹配我的数据）时，tf.gradients() 会导致错误“RuntimeError：启用急切执行时不支持 tf.gradients。请改用 tf.GradientTape。”我怀疑我们需要使用tf.gradients()，而不是使用tf.gradients()。
编辑使用tape.gradient。
你知道如何计算 (x,y)-batch 中的每个 grad，但不使用 for 循环吗？使用 for 循环太慢了。
@Song，你能举个例子吗？如果我没记错的话，无论您的x 是什么，毕业生的大小都将是（这意味着批次中的每个样本都有渐变）。你是如何使用 for 循环的？
@thushv89 是的，即使 x 包含许多示例，gradtape 也会返回一个渐变，有时我需要为小批量中的每个示例计算每个渐变。然后我可以将渐变剪裁为渐变。似乎 gradtape 总是只返回一个 grad，所以除了 for 循环似乎没有什么好主意。 pytorch 也是如此。

【解决方案2】：

这应该在 TF2 中工作：

inp = tf.Variable(np.random.normal(size=(25, 120)), dtype=tf.float32)

with tf.GradientTape() as tape:
    preds = model(inp)

grads = tape.gradient(preds, inp)

基本上你的做法与 TF1 相同，但使用 GradientTape。

【讨论】：

谢谢！这有效（我只需将输入形状更改为“size=（120,26）”，因为我有 26 个输入列）。关键是使用 tf.Variable() 将数据 (X) 从 numpy 转换为 tf 变量 (inp)。我曾尝试过 tf.convert_to_tensor()，但这不起作用。
你知道 1) 为什么需要使用 tf.Variable() 而不是直接输入一个 numpy 数组？ 2) 为什么叫model(inp) 而不是model.predict(inp)？（model(X) 和 model.predict(X) 有什么区别？）
我对变量一无所知，但是 model(x) 和 model.predict(x) 不一样，predict 与 numpy 数组一起工作，而 model(x) 进行 tensorflow 的符号计算可以区分。
你知道如何计算 (x,y)-batch 中的每个 grad，但不使用 for 循环吗？使用 for 循环太慢了。
@Song 请提出您自己的问题并详细说明。

【解决方案3】：

在上述情况下，我们应该使用tape.watch()

for (x, y) in test_dataset:
    with tf.GradientTape() as tape:
        tape.watch(x)

        pred = model(x)

grads = tape.gradient(pred, x)

但毕业生将只有输入的毕业生

下面的方法比较好，可以用模型预测预测结果并计算loss，然后用loss计算所有可训练变量的grads

with tf.GradientTape() as tape:
    predictions = model(x, training=True)
    loss = loss_function(y, predictions)
grads = tape.gradient(loss, model.trainable_variables)

【讨论】：