【问题标题】:Calculating gradient norm wrt weights with keras用 keras 计算梯度范数 wrt 权重
【发布时间】:2017-08-15 13:53:38
【问题描述】:

我正在尝试使用 keras(作为诊断工具)计算关于神经网络权重的梯度范数。最终,我想为此创建一个回调,但在此过程中,我一直致力于创建一个函数,该函数可以计算梯度并以 numpy 数组/标量值的形式返回实际值(而不仅仅是张量流张量)。代码如下:

import numpy as np
import keras.backend as K
from keras.layers import Dense
from keras.models import Sequential


def get_gradient_norm_func(model):
    grads = K.gradients(model.total_loss, model.trainable_weights)
    summed_squares = [K.sum(K.square(g)) for g in grads]
    norm = K.sqrt(sum(summed_squares))
    func = K.function([model.input], [norm])
    return func


def main():
    x = np.random.random((128,)).reshape((-1, 1))
    y = 2 * x
    model = Sequential(layers=[Dense(2, input_shape=(1,)),
                               Dense(1)])
    model.compile(loss='mse', optimizer='RMSprop')
    get_gradient = get_gradient_norm_func(model)
    history = model.fit(x, y, epochs=1)
    print(get_gradient([x]))

if  __name__ == '__main__':
    main()

代码在调用get_gradient() 时失败。回溯很长,涉及很多关于形状的信息,但关于什么是正确形状的信息很少。我该如何纠正这个问题?

理想情况下,我想要一个与后端无关的解决方案,但基于 tensorflow 的解决方案也是一种选择。

2017-08-15 15:39:14.914388: W tensorflow/core/framework/op_kernel.cc:1148] Invalid argument: Shape [-1,-1] has negative dimensions
2017-08-15 15:39:14.914414: E tensorflow/core/common_runtime/executor.cc:644] Executor failed to create kernel. Invalid argument: Shape [-1,-1] has negative dimensions
         [[Node: dense_2_target = Placeholder[dtype=DT_FLOAT, shape=[?,?], _device="/job:localhost/replica:0/task:0/cpu:0"]()]]
2017-08-15 15:39:14.915026: W tensorflow/core/framework/op_kernel.cc:1148] Invalid argument: Shape [-1,-1] has negative dimensions
2017-08-15 15:39:14.915038: E tensorflow/core/common_runtime/executor.cc:644] Executor failed to create kernel. Invalid argument: Shape [-1,-1] has negative dimensions
         [[Node: dense_2_target = Placeholder[dtype=DT_FLOAT, shape=[?,?], _device="/job:localhost/replica:0/task:0/cpu:0"]()]]
2017-08-15 15:39:14.915310: W tensorflow/core/framework/op_kernel.cc:1148] Invalid argument: Shape [-1] has negative dimensions
2017-08-15 15:39:14.915321: E tensorflow/core/common_runtime/executor.cc:644] Executor failed to create kernel. Invalid argument: Shape [-1] has negative dimensions
         [[Node: dense_2_sample_weights = Placeholder[dtype=DT_FLOAT, shape=[?], _device="/job:localhost/replica:0/task:0/cpu:0"]()]]
Traceback (most recent call last):
  File "/home/josteb/.local/opt/anaconda3/envs/timeseries/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1139, in _do_call
    return fn(*args)
  File "/home/josteb/.local/opt/anaconda3/envs/timeseries/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1121, in _run_fn
    status, run_metadata)
  File "/home/josteb/.local/opt/anaconda3/envs/timeseries/lib/python3.6/contextlib.py", line 89, in __exit__
    next(self.gen)
  File "/home/josteb/.local/opt/anaconda3/envs/timeseries/lib/python3.6/site-packages/tensorflow/python/framework/errors_impl.py", line 466, in raise_exception_on_not_ok_status
    pywrap_tensorflow.TF_GetCode(status))
tensorflow.python.framework.errors_impl.InvalidArgumentError: Shape [-1] has negative dimensions
         [[Node: dense_2_sample_weights = Placeholder[dtype=DT_FLOAT, shape=[?], _device="/job:localhost/replica:0/task:0/cpu:0"]()]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "gradientlog.py", line 45, in <module>
    main()
  File "gradientlog.py", line 42, in main
    print(get_gradient([x]))
  File "/home/josteb/sandbox/keras/keras/backend/tensorflow_backend.py", line 2251, in __call__
    **self.session_kwargs)
  File "/home/josteb/.local/opt/anaconda3/envs/timeseries/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 789, in run
    run_metadata_ptr)
  File "/home/josteb/.local/opt/anaconda3/envs/timeseries/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 997, in _run
    feed_dict_string, options, run_metadata)
  File "/home/josteb/.local/opt/anaconda3/envs/timeseries/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1132, in _do_run
    target_list, options, run_metadata)
  File "/home/josteb/.local/opt/anaconda3/envs/timeseries/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1152, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Shape [-1] has negative dimensions
         [[Node: dense_2_sample_weights = Placeholder[dtype=DT_FLOAT, shape=[?], _device="/job:localhost/replica:0/task:0/cpu:0"]()]]

Caused by op 'dense_2_sample_weights', defined at:
  File "gradientlog.py", line 45, in <module>
    main()
  File "gradientlog.py", line 39, in main
    model.compile(loss='mse', optimizer='RMSprop')
  File "/home/josteb/sandbox/keras/keras/models.py", line 783, in compile
    **kwargs)
  File "/home/josteb/sandbox/keras/keras/engine/training.py", line 799, in compile
    name=name + '_sample_weights'))
  File "/home/josteb/sandbox/keras/keras/backend/tensorflow_backend.py", line 435, in placeholder
    x = tf.placeholder(dtype, shape=shape, name=name)
  File "/home/josteb/.local/opt/anaconda3/envs/timeseries/lib/python3.6/site-packages/tensorflow/python/ops/array_ops.py", line 1530, in placeholder
    return gen_array_ops._placeholder(dtype=dtype, shape=shape, name=name)
  File "/home/josteb/.local/opt/anaconda3/envs/timeseries/lib/python3.6/site-packages/tensorflow/python/ops/gen_array_ops.py", line 1954, in _placeholder
    name=name)
  File "/home/josteb/.local/opt/anaconda3/envs/timeseries/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 767, in apply_op
    op_def=op_def)
  File "/home/josteb/.local/opt/anaconda3/envs/timeseries/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 2506, in create_op
    original_op=self._default_original_op, op_def=op_def)
  File "/home/josteb/.local/opt/anaconda3/envs/timeseries/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1269, in __init__
    self._traceback = _extract_stack()

InvalidArgumentError (see above for traceback): Shape [-1] has negative dimensions
         [[Node: dense_2_sample_weights = Placeholder[dtype=DT_FLOAT, shape=[?], _device="/job:localhost/replica:0/task:0/cpu:0"]()]]

【问题讨论】:

    标签: python tensorflow neural-network keras


    【解决方案1】:

    在 Keras 中有几个与梯度计算过程相关的占位符:

    1. 输入x
    2. 目标y
    3. 样本权重:即使您未在 model.fit() 中提供,Keras 仍会为样本权重生成占位符,并在训练期间将 np.ones((y.shape[0],), dtype=K.floatx()) 输入到图中。
    4. 学习阶段:只有当有任何层使用它时,这个占位符才会连接到梯度张量(例如Dropout)。

    因此,在您提供的示例中,为了计算梯度,您需要将 xysample_weights 输入到图中。这就是错误的根本原因。

    Model._make_train_function() 内部有the following lines 显示在这种情况下如何构造K.function() 的必要输入:

    inputs = self._feed_inputs + self._feed_targets + self._feed_sample_weights
    if self.uses_learning_phase and not isinstance(K.learning_phase(), int):
        inputs += [K.learning_phase()]
    
    with K.name_scope('training'):
        ...
        self.train_function = K.function(inputs,
                                         [self.total_loss] + self.metrics_tensors,
                                         updates=updates,
                                         name='train_function',
                                         **self._function_kwargs)
    

    通过模仿这个函数,你应该可以得到范数:

    def get_gradient_norm_func(model):
        grads = K.gradients(model.total_loss, model.trainable_weights)
        summed_squares = [K.sum(K.square(g)) for g in grads]
        norm = K.sqrt(sum(summed_squares))
        inputs = model.model._feed_inputs + model.model._feed_targets + model.model._feed_sample_weights
        func = K.function(inputs, [norm])
        return func
    
    def main():
        x = np.random.random((128,)).reshape((-1, 1))
        y = 2 * x
        model = Sequential(layers=[Dense(2, input_shape=(1,)),
                                   Dense(1)])
        model.compile(loss='mse', optimizer='rmsprop')
        get_gradient = get_gradient_norm_func(model)
        history = model.fit(x, y, epochs=1)
        print(get_gradient([x, y, np.ones(len(y))]))
    

    执行输出:

    Epoch 1/1
    128/128 [==============================] - 0s - loss: 2.0073     
    [4.4091368]
    

    请注意,由于您使用的是Sequential 而不是Model,因此需要model.model._feed_* 而不是model._feed_*

    【讨论】:

    • 感谢您提供非常明确的答复。使用指向 _make_train_function 的指针,我还能够弄清楚如何将任意 keras 张量插入到 keras 的度量系统中,从而确保在每次迭代时记录该张量的值(这可以通过添加张量编译模型后到model.metrics_tensorsmodel.metrics_names(都是列表)。
    【解决方案2】:

    扩展josteinb's comment,分享我用过的版本。

    the previous answer基本相同,但此版本将范数计算集成到通常的训练程序中。

    import keras.backend as K
    
    # Get a "l2 norm of gradients" tensor
    def get_gradient_norm(model):
        with K.name_scope('gradient_norm'):
            grads = K.gradients(model.total_loss, model.trainable_weights)
            norm = K.sqrt(sum([K.sum(K.square(g)) for g in grads]))
        return norm
    
    # Build a model
    model = Model(...)
    
    # Compile the model
    model.compile(
        loss="categorical_crossentropy",
        optimizer="adam",
        metrics=["categorical_accuracy"],
    )
    
    # Append the "l2 norm of gradients" tensor as a metric
    model.metrics_names.append("gradient_norm")
    model.metrics_tensors.append(get_gradient_norm(model))
    
    # You can compute the norm within the usual training routine
    loss, acc, gradient_norm = model.train_on_batch(batch, label)
    

    【讨论】:

      猜你喜欢
      • 2020-04-03
      • 2019-11-04
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2020-03-27
      • 2017-12-06
      • 1970-01-01
      • 2020-05-12
      相关资源
      最近更新 更多