使用 Keras/Tensorflow 或 autograd 计算验证误差 w.r.t 输入的梯度答案

【问题标题】：Calculate gradient of validation error w.r.t inputs using Keras/Tensorflow or autograd使用 Keras/Tensorflow 或 autograd 计算验证误差 w.r.t 输入的梯度
【发布时间】：2020-10-06 16:27:35
【问题描述】：

我需要计算验证误差 w.r.t 输入 x 的梯度。我试图查看当我扰乱一个训练样本时验证错误发生了多少变化。

验证误差 (E) 显式取决于模型权重 (W)。
模型权重显式取决于输入（x 和 y）。
因此，验证错误隐含地取决于输入。

我正在尝试直接计算 E w.r.t x 的梯度。另一种方法是计算 E w.r.t W 的梯度（很容易计算）和 W w.r.t x 的梯度（目前不能），这将允许计算 E w.r.t x 的梯度。

我附上了一个玩具示例。提前致谢！

import numpy as np
import mnist
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.utils import to_categorical
import tensorflow as tf
from autograd import grad

train_images = mnist.train_images()
train_labels = mnist.train_labels()
test_images = mnist.test_images()
test_labels = mnist.test_labels()

# Normalize the images.
train_images = (train_images / 255) - 0.5
test_images = (test_images / 255) - 0.5

# Flatten the images.
train_images = train_images.reshape((-1, 784))
test_images = test_images.reshape((-1, 784))

# Build the model.
model = Sequential([
    Dense(64, activation='relu', input_shape=(784,)),
    Dense(64, activation='relu'),
    Dense(10, activation='softmax'),
])

# Compile the model.
model.compile(
    optimizer='adam',
    loss='categorical_crossentropy',
    metrics=['accuracy'],
)

# Train the model.
model.fit(
    train_images,
    to_categorical(train_labels),
    epochs=5,
    batch_size=32,
)
model.save_weights('model.h5')
# Load the model's saved weights.
# model.load_weights('model.h5')

calculate_mse = tf.keras.losses.MeanSquaredError()

test_x = test_images[:5]
test_y = to_categorical(test_labels)[:5]

train_x = train_images[:1]
train_y = to_categorical(train_labels)[:1]

train_y = tf.convert_to_tensor(train_y, np.float32)
train_x = tf.convert_to_tensor(train_x, np.float64)

with tf.GradientTape() as tape:
    tape.watch(train_x)
    model.fit(train_x, train_y, epochs=1, verbose=0)
    valid_y_hat = model(test_x, training=False)
    mse = calculate_mse(test_y, valid_y_hat)
de_dx = tape.gradient(mse, train_x)
print(de_dx)


# approach 2 - does not run
def calculate_validation_mse(x):
    model.fit(x, train_y, epochs=1, verbose=0)
    valid_y_hat = model(test_x, training=False)
    mse = calculate_mse(test_y, valid_y_hat)
    return mse


train_x = train_images[:1]
train_y = to_categorical(train_labels)[:1]

validation_gradient = grad(calculate_validation_mse)
de_dx = validation_gradient(train_x)
print(de_dx)

【问题讨论】：

标签： python-3.x keras tensorflow2.0 autograd

【解决方案1】：

您可以这样做。推导如下。

注意事项，

我已将特征大小从 784 减少到 256，因为我在 colab 中的内存不足（代码中标记的行）。可能需要进行一些内存分析才能找出原因
仅计算第一层的等级。可轻松扩展到其他层

免责声明：据我所知，此推导是正确的。请做一些研究并验证情况是否如此。对于较大的输入和层大小，您将遇到内存问题。

import numpy as np
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.utils import to_categorical
import tensorflow as tf

f = 256

model = Sequential([
    Dense(64, activation='relu', input_shape=(f,)),
    Dense(64, activation='relu'),
    Dense(10, activation='softmax'),
])

# Compile the model.
model.compile(
    optimizer='adam',
    loss='categorical_crossentropy',
    metrics=['accuracy'],
)

w = model.weights[0]

# Inputs and labels
x_tr = tf.Variable(np.random.normal(size=(1,f)), shape=(1, f), dtype='float32')
y_tr = np.random.choice([0,1,2,3,4,5,6,7,8,9], size=(1,1))
y_tr_onehot = tf.keras.utils.to_categorical(y_tr, num_classes=10).astype('float32')
x_v = tf.Variable(np.random.normal(size=(1,f)), shape=(1, f), dtype='float32')
y_v = np.random.choice([0,1,2,3,4,5,6,7,8,9], size=(1,1))
y_v_onehot = tf.keras.utils.to_categorical(y_v, num_classes=10).astype('float32')

# In the context of GradientTape

with tf.GradientTape() as tape1:

  with tf.GradientTape() as tape2:
    y_tr_pred = model(x_tr)   
    tr_loss = tf.keras.losses.MeanSquaredError()(y_tr_onehot, y_tr_pred)

  tmp_g = tape2.gradient(tr_loss, w)
  print(tmp_g.shape)

# d(dE_tr/d(theta))/dx
# Warning this step consumes lot of memory for large layers
lr = 0.001
grads_1 = -lr * tape1.jacobian(tmp_g, x_tr)

with tf.GradientTape() as tape3:
  y_v_pred = model(x_v)   
  v_loss = tf.keras.losses.MeanSquaredError()(y_v_onehot, y_v_pred)

# dE_val/d(theta)
grads_2 = tape3.gradient(v_loss, w)[tf.newaxis, :]

# Just crunching the dimension to get the final desired shape of (1,256)
grad = tf.matmul(tf.reshape(grads_2,[1, -1]), tf.reshape(tf.transpose(grads_1,[2,1,0,3]),[1, -1, 256]))

【讨论】：