用 keras 计算梯度范数 wrt 权重答案

【问题标题】：Calculating gradient norm wrt weights with keras用 keras 计算梯度范数 wrt 权重
【发布时间】：2017-08-15 13:53:38
【问题描述】：

我正在尝试使用 keras（作为诊断工具）计算关于神经网络权重的梯度范数。最终，我想为此创建一个回调，但在此过程中，我一直致力于创建一个函数，该函数可以计算梯度并以 numpy 数组/标量值的形式返回实际值（而不仅仅是张量流张量）。代码如下：

import numpy as np
import keras.backend as K
from keras.layers import Dense
from keras.models import Sequential


def get_gradient_norm_func(model):
    grads = K.gradients(model.total_loss, model.trainable_weights)
    summed_squares = [K.sum(K.square(g)) for g in grads]
    norm = K.sqrt(sum(summed_squares))
    func = K.function([model.input], [norm])
    return func


def main():
    x = np.random.random((128,)).reshape((-1, 1))
    y = 2 * x
    model = Sequential(layers=[Dense(2, input_shape=(1,)),
                               Dense(1)])
    model.compile(loss='mse', optimizer='RMSprop')
    get_gradient = get_gradient_norm_func(model)
    history = model.fit(x, y, epochs=1)
    print(get_gradient([x]))

if  __name__ == '__main__':
    main()

代码在调用get_gradient() 时失败。回溯很长，涉及很多关于形状的信息，但关于什么是正确形状的信息很少。我该如何纠正这个问题？

理想情况下，我想要一个与后端无关的解决方案，但基于 tensorflow 的解决方案也是一种选择。

2017-08-15 15:39:14.914388: W tensorflow/core/framework/op_kernel.cc:1148] Invalid argument: Shape [-1,-1] has negative dimensions
2017-08-15 15:39:14.914414: E tensorflow/core/common_runtime/executor.cc:644] Executor failed to create kernel. Invalid argument: Shape [-1,-1] has negative dimensions
         [[Node: dense_2_target = Placeholder[dtype=DT_FLOAT, shape=[?,?], _device="/job:localhost/replica:0/task:0/cpu:0"]()]]
2017-08-15 15:39:14.915026: W tensorflow/core/framework/op_kernel.cc:1148] Invalid argument: Shape [-1,-1] has negative dimensions
2017-08-15 15:39:14.915038: E tensorflow/core/common_runtime/executor.cc:644] Executor failed to create kernel. Invalid argument: Shape [-1,-1] has negative dimensions
         [[Node: dense_2_target = Placeholder[dtype=DT_FLOAT, shape=[?,?], _device="/job:localhost/replica:0/task:0/cpu:0"]()]]
2017-08-15 15:39:14.915310: W tensorflow/core/framework/op_kernel.cc:1148] Invalid argument: Shape [-1] has negative dimensions
2017-08-15 15:39:14.915321: E tensorflow/core/common_runtime/executor.cc:644] Executor failed to create kernel. Invalid argument: Shape [-1] has negative dimensions
         [[Node: dense_2_sample_weights = Placeholder[dtype=DT_FLOAT, shape=[?], _device="/job:localhost/replica:0/task:0/cpu:0"]()]]
Traceback (most recent call last):
  File "/home/josteb/.local/opt/anaconda3/envs/timeseries/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1139, in _do_call
    return fn(*args)
  File "/home/josteb/.local/opt/anaconda3/envs/timeseries/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1121, in _run_fn
    status, run_metadata)
  File "/home/josteb/.local/opt/anaconda3/envs/timeseries/lib/python3.6/contextlib.py", line 89, in __exit__
    next(self.gen)
  File "/home/josteb/.local/opt/anaconda3/envs/timeseries/lib/python3.6/site-packages/tensorflow/python/framework/errors_impl.py", line 466, in raise_exception_on_not_ok_status
    pywrap_tensorflow.TF_GetCode(status))
tensorflow.python.framework.errors_impl.InvalidArgumentError: Shape [-1] has negative dimensions
         [[Node: dense_2_sample_weights = Placeholder[dtype=DT_FLOAT, shape=[?], _device="/job:localhost/replica:0/task:0/cpu:0"]()]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "gradientlog.py", line 45, in <module>
    main()
  File "gradientlog.py", line 42, in main
    print(get_gradient([x]))
  File "/home/josteb/sandbox/keras/keras/backend/tensorflow_backend.py", line 2251, in __call__
    **self.session_kwargs)
  File "/home/josteb/.local/opt/anaconda3/envs/timeseries/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 789, in run
    run_metadata_ptr)
  File "/home/josteb/.local/opt/anaconda3/envs/timeseries/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 997, in _run
    feed_dict_string, options, run_metadata)
  File "/home/josteb/.local/opt/anaconda3/envs/timeseries/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1132, in _do_run
    target_list, options, run_metadata)
  File "/home/josteb/.local/opt/anaconda3/envs/timeseries/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1152, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Shape [-1] has negative dimensions
         [[Node: dense_2_sample_weights = Placeholder[dtype=DT_FLOAT, shape=[?], _device="/job:localhost/replica:0/task:0/cpu:0"]()]]

Caused by op 'dense_2_sample_weights', defined at:
  File "gradientlog.py", line 45, in <module>
    main()
  File "gradientlog.py", line 39, in main
    model.compile(loss='mse', optimizer='RMSprop')
  File "/home/josteb/sandbox/keras/keras/models.py", line 783, in compile
    **kwargs)
  File "/home/josteb/sandbox/keras/keras/engine/training.py", line 799, in compile
    name=name + '_sample_weights'))
  File "/home/josteb/sandbox/keras/keras/backend/tensorflow_backend.py", line 435, in placeholder
    x = tf.placeholder(dtype, shape=shape, name=name)
  File "/home/josteb/.local/opt/anaconda3/envs/timeseries/lib/python3.6/site-packages/tensorflow/python/ops/array_ops.py", line 1530, in placeholder
    return gen_array_ops._placeholder(dtype=dtype, shape=shape, name=name)
  File "/home/josteb/.local/opt/anaconda3/envs/timeseries/lib/python3.6/site-packages/tensorflow/python/ops/gen_array_ops.py", line 1954, in _placeholder
    name=name)
  File "/home/josteb/.local/opt/anaconda3/envs/timeseries/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 767, in apply_op
    op_def=op_def)
  File "/home/josteb/.local/opt/anaconda3/envs/timeseries/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 2506, in create_op
    original_op=self._default_original_op, op_def=op_def)
  File "/home/josteb/.local/opt/anaconda3/envs/timeseries/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1269, in __init__
    self._traceback = _extract_stack()

InvalidArgumentError (see above for traceback): Shape [-1] has negative dimensions
         [[Node: dense_2_sample_weights = Placeholder[dtype=DT_FLOAT, shape=[?], _device="/job:localhost/replica:0/task:0/cpu:0"]()]]

【问题讨论】：

标签： python tensorflow neural-network keras

【解决方案1】：

在 Keras 中有几个与梯度计算过程相关的占位符：

输入x
目标y
样本权重：即使您未在 model.fit() 中提供，Keras 仍会为样本权重生成占位符，并在训练期间将 np.ones((y.shape[0],), dtype=K.floatx()) 输入到图中。
学习阶段：只有当有任何层使用它时，这个占位符才会连接到梯度张量（例如Dropout）。

因此，在您提供的示例中，为了计算梯度，您需要将 x、y 和 sample_weights 输入到图中。这就是错误的根本原因。

在Model._make_train_function() 内部有the following lines 显示在这种情况下如何构造K.function() 的必要输入：

inputs = self._feed_inputs + self._feed_targets + self._feed_sample_weights
if self.uses_learning_phase and not isinstance(K.learning_phase(), int):
    inputs += [K.learning_phase()]

with K.name_scope('training'):
    ...
    self.train_function = K.function(inputs,
                                     [self.total_loss] + self.metrics_tensors,
                                     updates=updates,
                                     name='train_function',
                                     **self._function_kwargs)

通过模仿这个函数，你应该可以得到范数：

def get_gradient_norm_func(model):
    grads = K.gradients(model.total_loss, model.trainable_weights)
    summed_squares = [K.sum(K.square(g)) for g in grads]
    norm = K.sqrt(sum(summed_squares))
    inputs = model.model._feed_inputs + model.model._feed_targets + model.model._feed_sample_weights
    func = K.function(inputs, [norm])
    return func

def main():
    x = np.random.random((128,)).reshape((-1, 1))
    y = 2 * x
    model = Sequential(layers=[Dense(2, input_shape=(1,)),
                               Dense(1)])
    model.compile(loss='mse', optimizer='rmsprop')
    get_gradient = get_gradient_norm_func(model)
    history = model.fit(x, y, epochs=1)
    print(get_gradient([x, y, np.ones(len(y))]))

执行输出：

Epoch 1/1
128/128 [==============================] - 0s - loss: 2.0073     
[4.4091368]

请注意，由于您使用的是Sequential 而不是Model，因此需要model.model._feed_* 而不是model._feed_*。

【讨论】：

感谢您提供非常明确的答复。使用指向 _make_train_function 的指针，我还能够弄清楚如何将任意 keras 张量插入到 keras 的度量系统中，从而确保在每次迭代时记录该张量的值（这可以通过添加张量编译模型后到model.metrics_tensors和model.metrics_names（都是列表）。

【解决方案2】：

扩展josteinb's comment，分享我用过的版本。

与the previous answer基本相同，但此版本将范数计算集成到通常的训练程序中。

import keras.backend as K

# Get a "l2 norm of gradients" tensor
def get_gradient_norm(model):
    with K.name_scope('gradient_norm'):
        grads = K.gradients(model.total_loss, model.trainable_weights)
        norm = K.sqrt(sum([K.sum(K.square(g)) for g in grads]))
    return norm

# Build a model
model = Model(...)

# Compile the model
model.compile(
    loss="categorical_crossentropy",
    optimizer="adam",
    metrics=["categorical_accuracy"],
)

# Append the "l2 norm of gradients" tensor as a metric
model.metrics_names.append("gradient_norm")
model.metrics_tensors.append(get_gradient_norm(model))

# You can compute the norm within the usual training routine
loss, acc, gradient_norm = model.train_on_batch(batch, label)

【讨论】：