在 TensorFlow 中创建距离矩阵答案

【问题标题】：Creating a distance matrix in TensorFlow在 TensorFlow 中创建距离矩阵
【发布时间】：2024-01-10 13:54:01
【问题描述】：

X 是一个 Tensor("stack:0", shape=(10, 2), dtype=int32) 表示坐标矩阵，例如：

[[2, 1], [5, 5], [4, 1], [0, 0], [6, 1], [2, 4], [6, 3], [5, 2 ], [5, 0], [2, 2]]

我想从 X 创建一个欧几里得距离矩阵，以显示所有坐标对之间的距离，因此我得到一个 shape=(10, 10) 的结果张量，例如：

[[0.000  2.000  5.000  4.123  1.414  1.414  6.082   2.000  4.123  4.000]

 [2.000  0.000  4.123  4.123  1.414  3.162  6.708   2.828  2.236  4.472]

 [5.000  4.123  0.000  2.000  3.605  5.000  4.472   3.605  3.162  3.000]

 [4.123  4.123  2.000  0.000  3.000  3.605  2.828   2.236  4.242  1.000]

 [1.414  1.414  3.605  3.000  0.000  2.000  5.385   1.414  3.000  3.162]

 [1.414  3.162  5.000  3.605  2.000  0.000  5.000   1.414  5.000  3.162]

 [6.082  6.708  4.472  2.828  5.385  5.000  0.000   4.123  7.071  2.236]

 [2.000  2.828  3.605  2.236  1.414  1.414  4.123   0.000  4.123  2.000]

 [4.123  2.236  3.162  4.242  3.000  5.000  7.071   4.123  0.000  5.000]

 [4.000  4.472  3.000  1.000  3.162  3.162  2.236   2.000  5.000  0.000]]

我尝试使用 tf.norm (https://www.tensorflow.org/api_docs/python/tf/norm) 但此功能无法正常工作。任何帮助将不胜感激。

【问题讨论】：

标签： python tensorflow

【解决方案1】：

这里是TF中欧式距离矩阵的直接计算：

t0 = [[2, 1], [5, 5], [4, 1], [0, 0], [6, 1], [2, 4], [6, 3], [5, 2], [5, 0], [2, 2]]
t = tf.convert_to_tensor(t0, dtype=tf.float32)

创建 2 个额外维度的辅助张量。当减去时，它们会给出成对的差异

t1 = tf.reshape(t, (1,10,2))
t2 = tf.reshape(t, (10,1,2))

result = tf.norm(t1-t2, ord='euclidean', axis=2,)

结果：

<tf.Tensor: id=157, shape=(10, 10), dtype=float32, numpy=
array([[0.       , 5.       , 2.       , 2.236068 , 4.       , 3.       ,
        4.472136 , 3.1622777, 3.1622777, 1.       ],
       [5.       , 0.       , 4.1231055, 7.071068 , 4.1231055, 3.1622777,
        2.236068 , 3.       , 5.       , 4.2426405],
       [2.       , 4.1231055, 0.       , 4.1231055, 2.       , 3.6055512,
        2.828427 , 1.4142135, 1.4142135, 2.236068 ],
       [2.236068 , 7.071068 , 4.1231055, 0.       , 6.0827627, 4.472136 ,
        6.708204 , 5.3851647, 5.       , 2.828427 ],
       [4.       , 4.1231055, 2.       , 6.0827627, 0.       , 5.       ,
        2.       , 1.4142135, 1.4142135, 4.1231055],
       [3.       , 3.1622777, 3.6055512, 4.472136 , 5.       , 0.       ,
        4.1231055, 3.6055512, 5.       , 2.       ],
       [4.472136 , 2.236068 , 2.828427 , 6.708204 , 2.       , 4.1231055,
        0.       , 1.4142135, 3.1622777, 4.1231055],
       [3.1622777, 3.       , 1.4142135, 5.3851647, 1.4142135, 3.6055512,
        1.4142135, 0.       , 2.       , 3.       ],
       [3.1622777, 5.       , 1.4142135, 5.       , 1.4142135, 5.       ,
        3.1622777, 2.       , 0.       , 3.6055512],
       [1.       , 4.2426405, 2.236068 , 2.828427 , 4.1231055, 2.       ,
        4.1231055, 3.       , 3.6055512, 0.       ]], dtype=float32)>

【讨论】：

如何将结果形状从 (10, 10) 更改为 (?, 10, 10, 1)？我想更改结果形状以将其输入到 Flatten 层，其输入形状为 (?, 10, 10, 1)。
result = tf.reshape (sometensor, (-1, 10, 10, 1)) 但您可能想删除该 Flatten 层

【解决方案2】：

虽然接受的答案适用于小输入，但如果您有大批量、许多维度或您的输入矩阵因其他原因很大，则会给您带来麻烦。

在一般情况下，我们有 2 个输入矩阵 X 和 Y（在 OPs 情况下为 X==Y'）。目标是得到距离矩阵D：

D[i,j] = tf.norm(X[i, ...]-Y[..., j])

如果X.shape() == (a,b) 和Y.shape() == (b,c)，做

tf.norm(tf.reshape(X, (a,b,1)) - tf.reshape(Y, (1,b,c)), axis=1)

需要一个大小为(a,b,c) 的中间矩阵，它可能不适合内存。

在这种情况下，你可以做一些代数来看看

D == tf.sqrt(
    tf.maximum(0., # catch O(1e-9) round off error resulting in nans
        tf.reduce_sum(tf.pow(X, 2), axis=-1, keepdims=True) + # (a, 1)
        tf.reduce_sum(tf.pow(Y, 2), axis=-2, keepdims=True) - # (1, c)
        2*tf.matmul(X, Y) # (a, c)
    )
)

这只会产生几个大小为(a,c) 的矩阵。为了防止取负数的平方根，您必须注意一点精度损失。

【讨论】：