为了让K.gradients() 层像这样工作,您必须将其包含在Lambda() 层中,否则不会创建完整的 Keras 层,并且您无法链接或训练它。所以这段代码可以工作(经过测试):
import keras
from keras.models import *
from keras.layers import *
from keras import backend as K
import tensorflow as tf
def grad( y, x ):
return Lambda( lambda z: K.gradients( z[ 0 ], z[ 1 ] ), output_shape = [1] )( [ y, x ] )
def network( i, d ):
m = Add()( [ i, d ] )
a = Lambda(lambda x: K.log( x ) )( m )
return a
fixed_input = Input(tensor=tf.constant( [ 1.0 ] ) )
double = Input(tensor=tf.constant( [ 2.0 ] ) )
a = network( fixed_input, double )
b = grad( a, fixed_input )
c = grad( b, fixed_input )
d = grad( c, fixed_input )
e = grad( d, fixed_input )
model = Model( inputs = [ fixed_input, double ], outputs = [ a, b, c, d, e ] )
print( model.predict( x=None, steps = 1 ) )
def network 模型 f( x ) = log( x + 2 ) 在 x = 1 . def grad 是完成梯度计算的地方。此代码输出:
[array([1.0986123], dtype=float32), array([0.33333334], dtype=float32), array([-0.11111112], dtype=float32), array([0.07407408], dtype=float32), array ([-0.07407409], dtype=float32)]
log( 3 )、⅓、 的正确值-1 / 32, 2 / 33, -6 / 34.
参考 TensorFlow 代码
供参考,纯TensorFlow中的相同代码(用于测试):
import tensorflow as tf
a = tf.constant( 1.0 )
a2 = tf.constant( 2.0 )
b = tf.log( a + a2 )
c = tf.gradients( b, a )
d = tf.gradients( c, a )
e = tf.gradients( d, a )
f = tf.gradients( e, a )
with tf.Session() as sess:
print( sess.run( [ b, c, d, e, f ] ) )
输出相同的值:
[1.0986123, [0.33333334], [-0.11111112], [0.07407408], [-0.07407409]]
黑森州
tf.hessians() 确实返回二阶导数,这是链接两个 tf.gradients() 的简写。 Keras 后端没有hessians,所以你必须链接两个K.gradients()。
数值近似
如果由于某种原因上述方法都不起作用,那么您可能需要考虑通过在较小的 ε 距离上取差值来对二阶导数进行数值近似。这基本上将网络对于每个输入增加了三倍,因此该解决方案除了缺乏准确性外,还引入了严重的效率考虑。无论如何,代码(已测试):
import keras
from keras.models import *
from keras.layers import *
from keras import backend as K
import tensorflow as tf
def network( i, d ):
m = Add()( [ i, d ] )
a = Lambda(lambda x: K.log( x ) )( m )
return a
fixed_input = Input(tensor=tf.constant( [ 1.0 ], dtype = tf.float64 ) )
double = Input(tensor=tf.constant( [ 2.0 ], dtype = tf.float64 ) )
epsilon = Input( tensor = tf.constant( [ 1e-7 ], dtype = tf.float64 ) )
eps_reciproc = Input( tensor = tf.constant( [ 1e+7 ], dtype = tf.float64 ) )
a0 = network( Subtract()( [ fixed_input, epsilon ] ), double )
a1 = network( fixed_input, double )
a2 = network( Add()( [ fixed_input, epsilon ] ), double )
d0 = Subtract()( [ a1, a0 ] )
d1 = Subtract()( [ a2, a1 ] )
dv0 = Multiply()( [ d0, eps_reciproc ] )
dv1 = Multiply()( [ d1, eps_reciproc ] )
dd0 = Multiply()( [ Subtract()( [ dv1, dv0 ] ), eps_reciproc ] )
model = Model( inputs = [ fixed_input, double, epsilon, eps_reciproc ], outputs = [ a0, dv0, dd0 ] )
print( model.predict( x=None, steps = 1 ) )
输出:
[数组([1.09861226]),数组([0.33333334]),数组([-0.1110223])]
(这只得到二阶导数。)