变量切片返回梯度无答案

【问题标题】：Slice of a variable returns gradient None变量切片返回梯度无
【发布时间】：2017-01-27 08:20:18
【问题描述】：

我一直在玩tf.gradients() 函数，遇到了一个我没想到的行为。也就是说，它似乎无法计算切片变量的梯度。我整理了一个例子，希望能说明我的意思：

import tensorflow as tf

a = tf.Variable([1.0])
b = tf.Variable([1.0])
c = tf.concat(0, [a, b])
print(c)  # >Tensor("concat:0", shape=(2,), dtype=float32)

grad_full = tf.gradients(c,  c)
grad_slice1 = tf.gradients(c,  a)
grad_slice2 = tf.gradients(c,  c[:, ])  # --> Here the gradient is None
grad_slice3 = tf.gradients(c,  c[0, ])  # --> Here the gradient is None

print(grad_full)  # >[<tf.Tensor 'gradients/Fill:0' shape=(2,) dtype=float32>]
print(grad_slice1)  # >[<tf.Tensor 'gradients_1/concat_grad/Slice:0' shape=(1,) dtype=float32>]
print(grad_slice2)  # >[None]
print(grad_slice3)  # >[None]

sess = tf.Session()
sess.run(tf.initialize_all_variables())

grad_full_v, grad_slice_v = sess.run([grad_full[0], grad_slice1[0]])
print(grad_full_v)  # >[ 1.  1.]
print(grad_slice_v)  # >[ 1.]

我的问题是：

1) 我是否按照预期的方式使用 tf.gradients() 函数？

2) 如果是这样，这种行为是否有原因？在我看来，切片不一定会破坏反向传播。

3) 这是否意味着我需要避免在整个网络中进行切片（或者至少对于从变量到损失的每条路径）？例如，这意味着我不能将全连接层的结果分割成许多有意义的部分（比如用一个 fc 层估计多个标量，然后将联合估计分割成我想要使用的部分）。

我正在使用 Python 3.5 在 Ubuntu 16 上从源代码构建 Tensorflow 0.11 RC0。

【问题讨论】：

我也遇到了同样的问题。还是没有答案？
无论如何，如果有帮助，我相信仍然没有“直接”的解决方案（github.com/tensorflow/tensorflow/issues/834）。目前，至少有两种解决方法：(1) 将变量拆分为切片变量，然后将 stop_gradient 应用于其余变量 (stackoverflow.com/questions/49048622/…) 或 (2) 将单独的变量定义到列表中（这在我的自己的问题）。切片变量的梯度 wrt 不起作用并返回 None ，正如您所观察到的那样。

标签： python tensorflow deep-learning

【解决方案1】：

d = c[:, ] 创建一个与a, b, c 不同的张量。如果考虑依赖图，d 取决于 c。那么梯度在这种情况下不起作用。 grad(y, x) 在 x 依赖于 y 时有效，而不是相反。

【讨论】：