总和等于 0 的最大子数组答案

【问题标题】：Largest subarray with sum equal to 0总和等于 0 的最大子数组
【发布时间】：2015-01-31 15:11:36
【问题描述】：

这是一个典型的面试问题。给定一个既包含正元素又包含负元素但不包含 0 的数组，找到总和等于 0 的最大子数组。我试图解决这个问题。这就是我想出的。

def sub_array_sum(array,k=0):
    start_index = -1
    hash_sum = {}
    current_sum = 0
    keys = set()
    best_index_hash = {}
    for i in array:
        start_index += 1
        current_sum += i
        if current_sum in hash_sum:
            hash_sum[current_sum].append(start_index)
            keys.add(current_sum)
        else:
            if current_sum == 0:
                best_index_hash[start_index] = [(0,start_index)]
            else:
                hash_sum[current_sum] = [start_index]
    if keys:
        for k_1 in keys:
            best_start = hash_sum.get(k_1)[0]
            best_end_list = hash_sum.get(k_1)[1:]
            for best_end in best_end_list:
                if abs(best_start-best_end) in best_index_hash:
                    best_index_hash[abs(best_start-best_end)].append((best_start+1,best_end))
                else:
                    best_index_hash[abs(best_start-best_end)] = [(best_start+1,best_end)]

    if best_index_hash:
        (bs,be) = best_index_hash[max(best_index_hash.keys(),key=int)].pop()
        return array[bs:be+1]
    else:
        print "No sub array with sum equal to 0"


def Main():
    a = [6,-2,8,5,4,-9,8,-2,1,2]
    b = [-8,8]
    c = [-7,8,-1]
    d = [2200,300,-6,6,5,-9]
    e = [-9,9,-6,-3]
    print sub_array_sum(a)
    print sub_array_sum(b)
    print sub_array_sum(c)
    print sub_array_sum(d)
    print sub_array_sum(e)

if __name__ == '__main__':
    Main()

我不确定这是否会满足所有边缘情况。如果有人可以对此发表评论，那就太好了另外我想将其扩展到等于任何 K 而不仅仅是 0 的总和。我应该怎么做。任何进一步优化这一点的指针也很有帮助。

【问题讨论】：

这个问题可能会在 codereview.stackexchange.com 而不是 stackoverflow 中得到好评
这不是subset sum problem吗？这是一个 NP 完全问题，所以我不希望有人找到有效的算法来解决它。
@Bakuriu 不，子数组!= 子序列。这个问题是关于一个子数组（连续元素），这使它更容易。
如果您还可以向我们提供您正在实施的算法的自然语言概述，将会很有帮助：它为什么有效，什么是循环不变量，什么是时间/空间复杂性？
克里斯，OP 似乎正在使用逻辑，为a[1...n], sum(i,j) = sum(0,j) - sum(0,i)。但是代码确实被多余的数据结构所累。

标签： python algorithm arrays

【解决方案1】：

您已经给出了一个不错的线性时间解决方案（比此时的其他两个答案更好，它们是二次时间），基于这样的想法：只要 sum(i .. j) = 0，它必须是sum(0 .. i-1) = sum(0 .. j) 反之亦然。本质上，您计算所有 i 的前缀总和 sum(0 .. i)，构建一个哈希表hash_sum，其中hash_sum[x] 是所有具有 sum(0 .. i) = x 的位置的列表。然后你遍历这个哈希表，一次一个总和，寻找由多个前缀组成的任何总和。在所有这些多次生成的总和中，您选择由一对相距最远的前缀组成的总和——这是最长的。

由于您已经注意到使该算法成为线性时间所需的关键洞察力，我有点不明白为什么您在第二个循环中的 best_index_hash 中构建了这么多不必要的东西。对于给定的和 x，构成该总和的最远的一对前缀将始终是 hash_sum[x] 中的最小和最大条目，这必然是第一个和最后一个条目（因为这是它们被附加的顺序），所以没有必要遍历其间的元素。事实上，您甚至根本不需要第二个循环：您可以在第一个循环期间保持运行最大值，将start_index 视为最右边的端点。

要处理任意差 k： 我们不需要找到最左边出现的前缀和 current_sum，而是需要找到最左边出现的前缀和 current_sum - k。但这只是first_with_sum{current_sum - k}。

以下代码未经测试，但应该可以工作：

def sub_array_sum(array,k=0):
    start_index = -1
    first_with_sum = {}
    first_with_sum{0} = -1
    best_start = -1
    best_len = 0
    current_sum = 0
    for i in array:
        start_index += 1
        current_sum += i
        if current_sum - k in first_with_sum:
            if start_index - first_with_sum{current_sum - k} > best_len:
                best_start = first_with_sum{current_sum - k} + 1
                best_len = start_index - first_with_sum{current_sum - k}
        else:
            first_with_sum{current_sum} = start_index

    if best_len > 0:
        return array[best_start:best_start+best_len-1]
    else:
        print "No subarray found"

在开头设置first_with_sum{0} = -1 意味着我们不必将从索引0 开始的范围视为特殊情况。请注意，此算法不会改善原始算法的渐近时间或空间复杂度，但实现起来更简单，并且在包含零和子数组的任何输入上使用的空间更少。

【讨论】：

【解决方案2】：

这是我自己的答案，只是为了好玩。

子序列的数量是二次的，求和子序列的时间是线性的，所以最简单的解决方案是三次。

这种方法只是对子序列进行详尽的搜索，但有一点技巧可以避免线性求和因子，因此它只是二次的。

from collections import namedtuple
from itertools import chain


class Element(namedtuple('Element', ('index', 'value'))):
    """
    An element in the input sequence. ``index`` is the position
    of the element, and ``value`` is the element itself.
    """
    pass


class Node(namedtuple('Node', ('a', 'b', 'sum'))):
    """
    A node in the search graph, which looks like this:

         0      1       2      3
          \    /  \    /  \   /
           0-1     1-2     2-3
              \   /   \   /
               0-2     1-3
                  \   /
                   0-3

    ``a`` is the start Element, ``b`` is the end Element, and
    ``sum`` is the sum of elements ``a`` through ``b``.
    """

    @classmethod
    def from_element(cls, e):
        """Construct a Node from a single Element."""
        return Node(a=e, b=e, sum=e.value)

    def __add__(self, other):
        """The combining operation depicted by the graph above."""
        assert self.a.index == other.a.index - 1
        assert self.b.index == other.b.index - 1
        return Node(a=self.a, b=other.b, sum=(self.sum + other.b.value))

    def __len__(self):
        """The number of elements represented by this node."""
        return self.b.index - self.a.index + 1


def get_longest_k_sum_subsequence(ints, k):
    """The longest subsequence of ``ints`` that sums to ``k``."""
    n = get_longest_node(n for n in generate_nodes(ints) if n.sum == k)
    if n:
        return ints[n.a.index:(n.b.index + 1)]
    if k == 0:
        return []


def get_longest_zero_sum_subsequence(ints):
    """The longest subsequence of ``ints`` that sums to zero."""
    return get_longest_k_sum_subsequence(ints, k=0)


def generate_nodes(ints):
    """Generates all Nodes in the graph."""
    nodes = [Node.from_element(Element(i, v)) for i, v in enumerate(ints)]
    while len(nodes) > 0:
        for n in nodes:
            yield n
        nodes = [x + y for x, y in zip(nodes, nodes[1:])]


def get_longest_node(nodes):
    """The longest Node in ``nodes``, or None if there are no Nodes."""
    return max(chain([()], nodes), key=len) or None


if __name__ == '__main__':

    def f(*ints):
        return get_longest_zero_sum_subsequence(list(ints))

    assert f() == []
    assert f(1) == []
    assert f(0) == [0]
    assert f(0, 0) == [0, 0]
    assert f(-1, 1) == [-1, 1]
    assert f(-1, 2, 1) == []
    assert f(1, -1, 1, -1) == [1, -1, 1, -1]
    assert f(1, -1, 8) == [1, -1]
    assert f(0, 1, -1, 8) == [0, 1, -1]
    assert f(5, 6, -2, 1, 1, 7, -2, 2, 8) == [-2, 1, 1]
    assert f(5, 6, -2, 2, 7, -2, 1, 1, 8) == [-2, 1, 1]

【讨论】：

问题中提出的算法是线性时间的。

【解决方案3】：

我同意 sundar nataraj 的观点，他说这必须发布到代码审查论坛。

虽然我查看了您的代码，但为了好玩。尽管我能够理解您的方法，但我无法理解使用Counter 的必要性。

best_index_hash[start_index] = [(0,start_index)] - 这里best_index_hash 的类型是Counter。为什么要给它分配一个列表？
for key_1, value_1 in best_index_hash.most_common(1) - 你试图得到largest 子序列，为此你使用most_common 作为答案。这在语义上并不直观。

我很想发布一个解决方案，但我会等你编辑代码 sn-p 并改进它。

附录

为了好玩，我尝试了这个难题，并在下面展示了我的努力。我不保证正确性/完整性。

from collections import defaultdict

def max_sub_array_sum(a, s):
    if a:
        span = defaultdict(lambda : (0,0))
        current_total = 0
        for i in xrange(len(a)):
            current_total = a[i]
            for j in xrange (i + 1, len(a)):
                current_total +=  a[j]
                x,y = span[current_total]
                if j - i > y - x:
                    span[current_total] = i,j

        if s in span:
            i, j = span[s]
            print "sum=%d,span_length=%d,indices=(%d,%d),sequence=%s" %\
                    (s, j-i + 1, i, j, str(a[i:j + 1]))
            return
    print "Could not find a subsequence of sum %d in sequence %s" % \
            (s, str(a))

max_sub_array_sum(range(-6, -1), 0)
max_sub_array_sum(None, 0)
max_sub_array_sum([], 0)
max_sub_array_sum(range(6), 15)
max_sub_array_sum(range(6), 14)
max_sub_array_sum(range(6), 13)
max_sub_array_sum(range(6), 0)

【讨论】：

是的。后来我意识到，你不需要柜台。我修改了它。我需要一个列表，因为如果有多个序列的和为零。我需要能够握住它，这样我才能得到最大的序列。
好的。 if best_index_hash 应该是该方法的顶层。 if keys 不是必需的。你能解释一下你打算在这里做什么best_index_hash[max(best_index_hash.keys(),key=int)]吗？

【解决方案4】：

这里是the solution taken from LeetCode：

def sub_array_sum(nums, k=0):
    count, sum = 0, 0
    map = dict()
    map[0] = 1
    for i in range(len(nums)):
        sum += nums[i]
        if map.__contains__(sum - k):
            count += map[sum - k]
        map[sum] = map.get(sum, 0) + 1
return count

【讨论】：