通过 TensorFlow 中的索引和值操作张量答案

【问题标题】：manipulate Tensor by indices and values in TensorFlow通过 TensorFlow 中的索引和值操作张量
【发布时间】：2018-04-11 18:11:04
【问题描述】：

要求

给定张量如：

SparseTensorValue(indices=array([[0, 0], [1, 0], [1, 1], [1, 2]]),
                  values=array([2, 0, 2, 5]),
                  dense_shape=array([2, 3]))

形状是 2x3

| 2 na na |
| 0  2  5 |

需要一个在索引中有值的新张量，如下所示：

请注意，值的总数为 6（[0, 1, 2, 3, 4, 5] 的集合）形状是 2x6

| 0 0 1 0 0 0 |
| 1 0 1 0 0 1 |

张量可以通过以下代码创建：

SparseTensorValue(indices=array([[0, 2], [1, 0], [1, 2], [1, 5]]),
                  values=array([1, 1, 1, 1]),
                  dense_shape=array([2, 6]))

如何用 TensorFlow 的方式做到这一点？以下两种方法都不起作用

import tensorflow as tf

tags = tf.SparseTensor(indices=[[0, 0], [1, 0], [1, 1], [1, 2]],
                       values=[2, 0, 2, 5],
                       dense_shape=[2, 3])

print(type(tags.indices))

# approach 1:  the TensorFlow way to implement the python logic
new_indices = [[tags.indices[i], tags.values[i]]
               for i in range(tags.values.shape[0])]  # syntax incorrect

# approach 2:
indice_idx = tf.map_fn(lambda x : x[0], tags.indices)
value_idx = tf.map_fn(lambda x : x[1], tags.indices)
value_arr = tf.gather(tags.values, value_idx)

with tf.Session() as s1:
    print(indice_idx.eval())
    print(tags.values.eval())
    print('value_arr', value_arr.eval())


"""
[0 0 1 2]   <-- value_idx, which is the index of tags.values

want to combine
[0 1 1 1]   <-- indice_idx
[2 2 0 2]   <-- value_arr, which is the value of tags.values
==>
[[0,2], [1,2], [1,0], [1,2]]
"""
new_indices = tf.concat(indice_idx, value_arr)  # syntax incorrect

with tf.Session() as s:
    s.run([tf.global_variables_initializer(), tf.tables_initializer()])
    print(s.run(value_arr))
    print(s.run(tags.values))
    print(s.run(new_indices))
    print(s.run(tags.indices[3, 1]))

【问题讨论】：

如果一个值在给定的行中重复怎么办？此外，稀疏矩阵假定未填充的元素通常为零，例如，对于稀疏矩阵的一般行，值“0”将重复大量次。
保证不重复，输入数据本身就是一个稀疏矩阵。
你的问题还不清楚。正在寻找如何构造中间张量？或者您是否正在寻找 k x 2 张量，原始中的每个（行索引，值）对都有一个单独的行？
就是如上所述构造一个新的Tensor。我正在寻找一种合适的方法来循环张量 indices 和 values，
在计算图内部，还是在计算图外部单独作为SparseTensorValue？对于外部，您可以使用 .indices 和 .values 属性进行迭代，是吗？

标签： python tensorflow machine-learning indices tensor

【解决方案1】：

回答

在方法 2 中： new_indices = tf.stack([indice_idx, value_arr], axis=1)

完整版代码是

import tensorflow as tf

tags = tf.SparseTensor(indices=[[0, 0], [1, 0], [1, 1], [1, 2]],
                       values=[2, 0, 2, 5],
                       dense_shape=[2, 3])

print(type(tags.indices))

# # approach 1:  any TensorFlow way to implement the Python logic below?
# new_indices = [[tags.indices[i], tags.values[i]]
#                for i in range(tags.values.shape[0])]  # syntax incorrect

# approach 2:
indice_idx = tf.map_fn(lambda x : x[0], tags.indices)
value_idx = tf.map_fn(lambda x : x[1], tags.indices)
value_arr = tf.cast(tf.gather(tags.values, value_idx), tf.int64)

with tf.Session() as s1:
    print(indice_idx.eval())
    print(tags.values.eval())
    print('value_arr', value_arr.eval())


"""
[0 0 1 2]   <-- value_idx, which is the index of tags.values

tf.stack does:
[0 1 1 1]   <-- indice_idx
[2 2 0 2]   <-- value_arr, which is the value of tags.values
==>
[[0,2], [1,2], [1,0], [1,2]]
"""
new_indices = tf.stack([indice_idx, value_arr], axis=1)

with tf.Session() as s:
    s.run([tf.global_variables_initializer(), tf.tables_initializer()])
    print(s.run(value_arr))
    print(s.run(tags.values))
    print(s.run(new_indices))
    print(s.run(tags.indices[3, 1]))

这个问题本身就解决了。

一个分离的相关问题

附：如果读取文件则不起作用，请参阅：

create multi-hot SparseTensor by categorical feature array column from CSV in TensorFlow

【讨论】：