【问题标题】:Calculate gradient of validation error w.r.t inputs using Keras/Tensorflow or autograd使用 Keras/Tensorflow 或 autograd 计算验证误差 w.r.t 输入的梯度
【发布时间】:2020-10-06 16:27:35
【问题描述】:

我需要计算验证误差 w.r.t 输入 x 的梯度。我试图查看当我扰乱一个训练样本时验证错误发生了多少变化。

  • 验证误差 (E) 显式取决于模型权重 (W)。
  • 模型权重显式取决于输入(x 和 y)。
  • 因此,验证错误隐含地取决于输入。

我正在尝试直接计算 E w.r.t x 的梯度。 另一种方法是计算 E w.r.t W 的梯度(很容易计算)和 W w.r.t x 的梯度(目前不能),这将允许计算 E w.r.t x 的梯度。

我附上了一个玩具示例。提前致谢!

import numpy as np
import mnist
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.utils import to_categorical
import tensorflow as tf
from autograd import grad

train_images = mnist.train_images()
train_labels = mnist.train_labels()
test_images = mnist.test_images()
test_labels = mnist.test_labels()

# Normalize the images.
train_images = (train_images / 255) - 0.5
test_images = (test_images / 255) - 0.5

# Flatten the images.
train_images = train_images.reshape((-1, 784))
test_images = test_images.reshape((-1, 784))

# Build the model.
model = Sequential([
    Dense(64, activation='relu', input_shape=(784,)),
    Dense(64, activation='relu'),
    Dense(10, activation='softmax'),
])

# Compile the model.
model.compile(
    optimizer='adam',
    loss='categorical_crossentropy',
    metrics=['accuracy'],
)

# Train the model.
model.fit(
    train_images,
    to_categorical(train_labels),
    epochs=5,
    batch_size=32,
)
model.save_weights('model.h5')
# Load the model's saved weights.
# model.load_weights('model.h5')

calculate_mse = tf.keras.losses.MeanSquaredError()

test_x = test_images[:5]
test_y = to_categorical(test_labels)[:5]

train_x = train_images[:1]
train_y = to_categorical(train_labels)[:1]

train_y = tf.convert_to_tensor(train_y, np.float32)
train_x = tf.convert_to_tensor(train_x, np.float64)

with tf.GradientTape() as tape:
    tape.watch(train_x)
    model.fit(train_x, train_y, epochs=1, verbose=0)
    valid_y_hat = model(test_x, training=False)
    mse = calculate_mse(test_y, valid_y_hat)
de_dx = tape.gradient(mse, train_x)
print(de_dx)


# approach 2 - does not run
def calculate_validation_mse(x):
    model.fit(x, train_y, epochs=1, verbose=0)
    valid_y_hat = model(test_x, training=False)
    mse = calculate_mse(test_y, valid_y_hat)
    return mse


train_x = train_images[:1]
train_y = to_categorical(train_labels)[:1]

validation_gradient = grad(calculate_validation_mse)
de_dx = validation_gradient(train_x)
print(de_dx)


【问题讨论】:

    标签: python-3.x keras tensorflow2.0 autograd


    【解决方案1】:

    您可以这样做。推导如下。

    注意事项,

    • 我已将特征大小从 784 减少到 256,因为我在 colab 中的内存不足(代码中标记的行)。可能需要进行一些内存分析才能找出原因
    • 仅计算第一层的等级。可轻松扩展到其他层

    免责声明:据我所知,此推导是正确的。请做一些研究并验证情况是否如此。对于较大的输入和层大小,您将遇到内存问题。

    import numpy as np
    from tensorflow.keras.models import Sequential
    from tensorflow.keras.layers import Dense
    from tensorflow.keras.utils import to_categorical
    import tensorflow as tf
    
    f = 256
    
    model = Sequential([
        Dense(64, activation='relu', input_shape=(f,)),
        Dense(64, activation='relu'),
        Dense(10, activation='softmax'),
    ])
    
    # Compile the model.
    model.compile(
        optimizer='adam',
        loss='categorical_crossentropy',
        metrics=['accuracy'],
    )
    
    w = model.weights[0]
    
    # Inputs and labels
    x_tr = tf.Variable(np.random.normal(size=(1,f)), shape=(1, f), dtype='float32')
    y_tr = np.random.choice([0,1,2,3,4,5,6,7,8,9], size=(1,1))
    y_tr_onehot = tf.keras.utils.to_categorical(y_tr, num_classes=10).astype('float32')
    x_v = tf.Variable(np.random.normal(size=(1,f)), shape=(1, f), dtype='float32')
    y_v = np.random.choice([0,1,2,3,4,5,6,7,8,9], size=(1,1))
    y_v_onehot = tf.keras.utils.to_categorical(y_v, num_classes=10).astype('float32')
    
    # In the context of GradientTape
    
    with tf.GradientTape() as tape1:
    
      with tf.GradientTape() as tape2:
        y_tr_pred = model(x_tr)   
        tr_loss = tf.keras.losses.MeanSquaredError()(y_tr_onehot, y_tr_pred)
    
      tmp_g = tape2.gradient(tr_loss, w)
      print(tmp_g.shape)
    
    # d(dE_tr/d(theta))/dx
    # Warning this step consumes lot of memory for large layers
    lr = 0.001
    grads_1 = -lr * tape1.jacobian(tmp_g, x_tr)
    
    with tf.GradientTape() as tape3:
      y_v_pred = model(x_v)   
      v_loss = tf.keras.losses.MeanSquaredError()(y_v_onehot, y_v_pred)
    
    # dE_val/d(theta)
    grads_2 = tape3.gradient(v_loss, w)[tf.newaxis, :]
    
    # Just crunching the dimension to get the final desired shape of (1,256)
    grad = tf.matmul(tf.reshape(grads_2,[1, -1]), tf.reshape(tf.transpose(grads_1,[2,1,0,3]),[1, -1, 256]))
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2019-11-19
      • 2019-01-16
      • 2017-06-29
      • 1970-01-01
      相关资源
      最近更新 更多