提取最大和子矩阵答案

【问题标题】：Extract max-sum submatrices提取最大和子矩阵
【发布时间】：2018-10-16 14:07:49
【问题描述】：

我有一个 2D NxN 矩阵，其中包含一组实数中的元素。我需要从中识别顶部 n DxD 子矩阵，以使它们的总和最大并返回子矩阵的左上角索引。我需要在 Tensorflow 中完成。

例如我有以下4x4 矩阵：

[1 1 4 4]
[1 1 4 4]
[3 3 2 2]
[3 3 2 2]

我需要确定总和最大的 2 个子矩阵并返回它们的左上角索引。在上述情况下，具有最大和第二大和的 2 个子矩阵是：

[[4 4]    [[3 3]
 [4 4]] &  [3 3]]

我需要返回[[0,2],[2,0]]，这两个矩阵的左上角索引。谢谢。

【问题讨论】：

我想您需要考虑所有可能的子矩阵，而不仅仅是不重叠的子矩阵，对吧？也就是说，在您的示例中，[[1, 4], [1, 4]] 和 [[1, 4], [3, 2]] 是有效的子矩阵，对吗？
是的，所有大小为 2x2 的子矩阵。它们可能会重叠——就像卷积中的滑动窗口操作一样，我需要“查看”所有子矩阵并找到前 n 个总和。

标签： python python-3.x tensorflow conv-neural-network

【解决方案1】：

你可以用下面的 sn-p 得到它。这个想法是建立一个张量来保存每个子矩阵的每个元素的行和列索引，然后对子矩阵求和并找到最大的和。

import tensorflow as tf

# Input data
input = tf.placeholder(tf.int32, [None, None])
# Submatrix dimension
dims = tf.placeholder(tf.int32, [2])
# Number of top submatrices to find
k = tf.placeholder(tf.int32, [])
# Sizes
input_shape = tf.shape(input)
rows, cols = input_shape[0], input_shape[1]
d_rows, d_cols = dims[0], dims[1]
subm_rows, subm_cols = rows - d_rows + 1, cols - d_cols + 1
# Index grids
ii, jj = tf.meshgrid(tf.range(subm_rows), tf.range(subm_cols), indexing='ij')
d_ii, d_jj = tf.meshgrid(tf.range(d_rows), tf.range(d_cols), indexing='ij')
# Add indices
subm_ii = ii[:, :, tf.newaxis, tf.newaxis] + d_ii
subm_jj = jj[:, :, tf.newaxis, tf.newaxis] + d_jj
# Make submatrices tensor
subm = tf.gather_nd(input, tf.stack([subm_ii, subm_jj], axis=-1))
# Add submatrices
subm_sum = tf.reduce_sum(subm, axis=(2, 3))
# Use TopK to find top submatrices
_, top_idx = tf.nn.top_k(tf.reshape(subm_sum, [-1]), tf.minimum(k, tf.size(subm_sum)))
# Get row and column
top_row = top_idx // subm_cols
top_col = top_idx % subm_cols
result = tf.stack([top_row, top_col], axis=-1)

# Test
with tf.Session() as sess:
    mat = [
        [1, 1, 4, 4],
        [1, 1, 4, 4],
        [3, 3, 2, 2],
        [3, 3, 2, 2],
    ]
    print(sess.run(result, feed_dict={input: mat, dims: [2, 2], k: 2}))

输出：

[[0 2]
 [1 2]]

请注意，这种情况下的输出是[0, 2] 和[1, 2]，而不是[2, 0]。那是因为从[1, 2] 开始的子矩阵和[2, 0] 处的子矩阵的总和相同，如果按行迭代它，它在矩阵中是之前的。如果你在测试中通过了k: 3，你也会在结果中得到[2, 0]。

【讨论】：

我想知道是否有办法在可训练的网络中使用这段代码？换句话说，如果我可以将“结果”传递给损失函数并进行训练。当我现在这样做时，我得到一个错误，我认为这是因为 top_k 是一个不可微的函数。
@Ahsan 是的，我认为是这样。但是，我不清楚您将如何真正区分这一点。 result 中的值是索引，那么它们的导数应该基于什么？你会将tf.diff 放在input 上（也就是说，将input 视为对连续函数的评估）？