【问题标题】:Extract the value of a Tensorflow/Keras GradientTape gradient of a variable提取变量的 Tensorflow/Keras GradientTape 梯度值
【发布时间】:2022-01-04 21:00:07
【问题描述】:

简而言之:我在 Tensorflow/Keras 2+ 中有一个自定义损失层,它实现了一个涉及两个变量的损失函数,它也经过了最小化。它有效,如下所示。我希望跟踪这两个变量的损失梯度。从tf.print() 输出来看,使用GradientTape.gradient() 似乎有效。但我不知道如何保持实际值。

详细说明

假设这是我的自定义损失层(是的,损失函数很傻,为了重现性,一切都被过度简化了):

import numpy as np
from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt
import tensorflow as tf
from tensorflow.keras.layers import Dense, Input, Layer
from tensorflow.keras.callbacks import EarlyStopping, Callback
import tensorflow.keras.backend as K
from tensorflow.keras import Model

class MyLoss(Layer):
    def __init__(self, var1, var2):
        super(MyLoss, self).__init__()
        self.var1 = K.variable(var1) # or tf.Variable(var1) etc.
        self.var2 = K.variable(var2)
    
    def get_vars(self):
        return self.var1, self.var2
    
    def get_gradients(self):
        return self.grads

    def custom_loss(self, y_true, y_pred):
        loss = self.var1 * K.mean(K.square(y_true-y_pred)) + self.var2 ** 2
        return loss

    def compute_gradients(self, y_true, y_pred):
        with tf.GradientTape() as g:
          loss = self.custom_loss(y_true, y_pred)
          return loss, g.gradient(loss, [self.var1, self.var2])
    
    def call(self, y_true, y_pred):
        loss, grads = self.compute_gradients(y_true, y_pred)
        self.grads = grads
        # tf.print(grads)
        self.add_loss(loss)
        return y_pred

假设这些是我的数据和Model(是的,y 作为附加输入进入模型,这有效且不相关):

n_col = 10
n_row = 1000
X = np.random.normal(size=(n_row, n_col))
beta = np.arange(10)
y = X @ beta

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

inputs = Input(shape=(X_train.shape[1],))
y_input = Input(shape=(1,))
hidden1 = Dense(10)(inputs)
output = Dense(1)(hidden1)
my_loss = MyLoss(0.5, 0.5)(y_input, output) # here can also initialize those var1, var2
model = Model(inputs=[inputs, y_input], outputs=my_loss)

model.compile(optimizer= 'adam')

现在模型和损失可以工作了,这从变量配置文件可以看出,例如通过在每个时期之后保留变量(如果您检查愚蠢的损失,它们的值也有意义):

var1_list = []
var2_list = []
for i in range(100):
    if i % 10 == 0:
        print('step %d' % i)
    model.fit([X_train, y_train], None,
              batch_size=32, epochs=1, validation_split=0.1, verbose=0)
    var1, var2 = model.layers[-1].get_vars()
    var1_list.append(var1.numpy())
    var2_list.append(var2.numpy())

plt.plot(var1_list, label='var1')
plt.plot(var2_list, 'r', label='var2')
plt.legend()
plt.show()

但是当我想观察/保持梯度时,我会得到一个(空的?)张量列表:

grads = model.layers[-1].get_gradients()
grads

ListWrapper([<tf.Tensor 'gradient_tape/model/my_loss/mul/Mul:0' shape=() dtype=float32>, <tf.Tensor 'gradient_tape/model/my_loss/pow/mul_1:0' shape=() dtype=float32>])

当然,在这些问题上致电numpy() 毫无意义:

grads[0].numpy()

AttributeError: 'Tensor' object has no attribute 'numpy'

但是。很明显这里有些东西,因为当我在训练时使用tf.print(grads) 打印渐变(取消注释上面call() 函数中的tf.print(grads))时,渐变值被打印出来并且它们也有意义:

[226.651245, 1]
[293.38916, 0.998]
[263.979889, 0.996000171]
[240.448029, 0.994000435]
[337.309021, 0.992001]
[286.644775, 0.990001857]
[194.823975, 0.988003075]
[173.756546, 0.98600477]
[267.330505, 0.984007]
[139.302826, 0.982009768]
[310.315216, 0.980013192]
[263.746216, 0.97801733]
[267.713, 0.976022303]
[291.754578, 0.974028111]
[376.523895, 0.972034812]
[474.974884, 0.970042467]
[375.520294, 0.968051136]
etc. etc.

请注意,无需添加g.watch([self.var1, self.var2]),尽管添加它不会改变问题。

如何跟踪这些渐变(例如跟踪 var1var2)? tf.print()“看到”了我看不到的什么?

【问题讨论】:

    标签: python tensorflow keras deep-learning gradienttape


    【解决方案1】:

    按照this 的回答,似乎一旦你像我一样手动操作,TF 可能会关闭急切执行。解决方案是在上面的model.compile() 行中添加run_eagerly=True

    model.compile(optimizer= 'adam', run_eagerly=True)
    

    然后我可以毫无问题地在我的grads 张量上调用.numpy(),例如:

    grad1_list = []
    grad2_list = []
    for i in range(100):
        if i % 10 == 0:
            print('step %d' % i)
        model.fit([X_train, y_train], None,
                  batch_size=32, epochs=1, validation_split=0.1, verbose=0)
        grad1, grad2 = model.layers[-1].get_gradients()
        grad1_list.append(grad1.numpy())
        grad2_list.append(grad2.numpy())
    
    plt.plot(grad1_list, label='grad1')
    plt.plot(grad2_list, 'r', label='grad2')
    plt.legend()
    plt.show()
    

    【讨论】:

      猜你喜欢
      • 2020-07-29
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2020-09-19
      • 2018-08-23
      • 1970-01-01
      • 1970-01-01
      • 2020-05-14
      相关资源
      最近更新 更多