通过稀疏张量反向传播梯度？答案

【问题标题】：Back-propagating gradients through a sparse tensor?通过稀疏张量反向传播梯度？
【发布时间】：2017-02-03 20:34:18
【问题描述】：

我有一个正常的前馈网络，它产生一个向量 v。然后将 v 的元素用作稀疏矩阵 M 的非零条目（假设坐标是预定义的）。然后将稀疏矩阵乘以密集向量，并在结果标量上定义损失。我想反向传播损失 w.r.t。网络的权重，这需要通过稀疏矩阵。

这似乎是稀疏矩阵的一个完全合理的用例，但似乎不支持这种功能。事实上，即使调用 tf.gradients(M,[v]) 也会产生错误：

AttributeError: 'SparseTensor' 对象没有属性 'value_index'

我是在做错什么，还是我正确地假设此功能不（还没有？）存在？如果是后者，那么对于这个特定的用例是否有解决方法，而不是用定义的梯度重写所有稀疏张量操作？

【问题讨论】：

标签： python tensorflow sparse-matrix backpropagation

【解决方案1】：

对此稍作改动确实有效，直接采用SparseTensor 的values 的梯度：

import tensorflow as tf
sparse_values = tf.identity(tf.Variable(tf.constant([1., 2., 3.])))
sparse_indices = tf.constant([[0, 0], [1, 1], [2, 2]], dtype=tf.int64)
sparse_matrix = tf.SparseTensor(sparse_indices, sparse_values, [3, 3])
multiplied = tf.sparse_tensor_dense_matmul(sparse_matrix, tf.eye(3))
loss = tf.reduce_sum(multiplied)
gradients = tf.gradients(loss, [sparse_values])
with tf.Session() as session:
    tf.global_variables_initializer().run()
    print(session.run(gradients))

打印（在 TensorFlow 0.12.1 上）：

[array([ 1.,  1.,  1.], dtype=float32)]

为什么tf.identity 操作对于定义渐变是必需的我还没有完全弄清楚（可能与 ref dtypes 有关）。

【讨论】：

【解决方案2】：

我在这里在黑暗中摸索，根据代码和文档工作，而不是经验。

Tensor 类的创建者是：

def __init__(self, op, value_index, dtype):
    #  value_index: An `int`. Index of the operation's endpoint that produces this tensor.

value_index 用于生成Tensor 名称。

SparseTensor 一个是

def __init__(self, indices, values, dense_shape):

在它的定义文件tensorflow/tensorflow/python/framework/sparse_tensor.py 中没有引用value_index。

它的参数是张量，大概每个都有自己的value_index。

在其他地方，SparseTensor 似乎是 IndexedSlices 的替代品，IndexedSlices 也包含张量。

tf.gradients 的输入都是

A `Tensor` or list of tensors

gradients 定义文件有一个_IndexedSlicesToTensor 方法，但没有与SparseTensor 等效的方法。因此，在IndexedSlices 的情况下似乎有某种自动转换为密集的（如果结果太大会发出警告），但对于SparseTensors 则没有。我不知道这是开发不完整的情况，还是导致无法实现的不兼容。

【讨论】：