【发布时间】:2022-01-04 21:00:07
【问题描述】:
简而言之:我在 Tensorflow/Keras 2+ 中有一个自定义损失层,它实现了一个涉及两个变量的损失函数,它也经过了最小化。它有效,如下所示。我希望跟踪这两个变量的损失梯度。从tf.print() 输出来看,使用GradientTape.gradient() 似乎有效。但我不知道如何保持实际值。
详细说明:
假设这是我的自定义损失层(是的,损失函数很傻,为了重现性,一切都被过度简化了):
import numpy as np
from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt
import tensorflow as tf
from tensorflow.keras.layers import Dense, Input, Layer
from tensorflow.keras.callbacks import EarlyStopping, Callback
import tensorflow.keras.backend as K
from tensorflow.keras import Model
class MyLoss(Layer):
def __init__(self, var1, var2):
super(MyLoss, self).__init__()
self.var1 = K.variable(var1) # or tf.Variable(var1) etc.
self.var2 = K.variable(var2)
def get_vars(self):
return self.var1, self.var2
def get_gradients(self):
return self.grads
def custom_loss(self, y_true, y_pred):
loss = self.var1 * K.mean(K.square(y_true-y_pred)) + self.var2 ** 2
return loss
def compute_gradients(self, y_true, y_pred):
with tf.GradientTape() as g:
loss = self.custom_loss(y_true, y_pred)
return loss, g.gradient(loss, [self.var1, self.var2])
def call(self, y_true, y_pred):
loss, grads = self.compute_gradients(y_true, y_pred)
self.grads = grads
# tf.print(grads)
self.add_loss(loss)
return y_pred
假设这些是我的数据和Model(是的,y 作为附加输入进入模型,这有效且不相关):
n_col = 10
n_row = 1000
X = np.random.normal(size=(n_row, n_col))
beta = np.arange(10)
y = X @ beta
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
inputs = Input(shape=(X_train.shape[1],))
y_input = Input(shape=(1,))
hidden1 = Dense(10)(inputs)
output = Dense(1)(hidden1)
my_loss = MyLoss(0.5, 0.5)(y_input, output) # here can also initialize those var1, var2
model = Model(inputs=[inputs, y_input], outputs=my_loss)
model.compile(optimizer= 'adam')
现在模型和损失可以工作了,这从变量配置文件可以看出,例如通过在每个时期之后保留变量(如果您检查愚蠢的损失,它们的值也有意义):
var1_list = []
var2_list = []
for i in range(100):
if i % 10 == 0:
print('step %d' % i)
model.fit([X_train, y_train], None,
batch_size=32, epochs=1, validation_split=0.1, verbose=0)
var1, var2 = model.layers[-1].get_vars()
var1_list.append(var1.numpy())
var2_list.append(var2.numpy())
plt.plot(var1_list, label='var1')
plt.plot(var2_list, 'r', label='var2')
plt.legend()
plt.show()
但是当我想观察/保持梯度时,我会得到一个(空的?)张量列表:
grads = model.layers[-1].get_gradients()
grads
ListWrapper([<tf.Tensor 'gradient_tape/model/my_loss/mul/Mul:0' shape=() dtype=float32>, <tf.Tensor 'gradient_tape/model/my_loss/pow/mul_1:0' shape=() dtype=float32>])
当然,在这些问题上致电numpy() 毫无意义:
grads[0].numpy()
AttributeError: 'Tensor' object has no attribute 'numpy'
但是。很明显这里有些东西,因为当我在训练时使用tf.print(grads) 打印渐变(取消注释上面call() 函数中的tf.print(grads))时,渐变值被打印出来并且它们也有意义:
[226.651245, 1] [293.38916, 0.998] [263.979889, 0.996000171] [240.448029, 0.994000435] [337.309021, 0.992001] [286.644775, 0.990001857] [194.823975, 0.988003075] [173.756546, 0.98600477] [267.330505, 0.984007] [139.302826, 0.982009768] [310.315216, 0.980013192] [263.746216, 0.97801733] [267.713, 0.976022303] [291.754578, 0.974028111] [376.523895, 0.972034812] [474.974884, 0.970042467] [375.520294, 0.968051136] etc. etc.
请注意,无需添加g.watch([self.var1, self.var2]),尽管添加它不会改变问题。
如何跟踪这些渐变(例如跟踪 var1 和 var2)? tf.print()“看到”了我看不到的什么?
【问题讨论】:
标签: python tensorflow keras deep-learning gradienttape