将一个集合划分为两个子集，使总和的差最小并返回两个子集答案

【问题标题】：Partition a set into two subsets so that the difference of the sum is minimum and return the two subsets将一个集合划分为两个子集，使总和的差最小并返回两个子集
【发布时间】：2021-11-11 00:46:52
【问题描述】：

有很多与此相关的问题，例如here.

但是，所有答案都集中在寻找最小绝对和上。我正在尝试使用各种答案中列出的一些方法，但我不太关心返回总和，而更关心返回两个子集（或者如果有多个解决方案，第一个找到最好的子集）。

有没有办法做到这一点而不是 NP-hard？

我的用例是我有一组不同高度的元素，我需要将这些元素排列成两列（两个数组），以最小化元素堆叠在一起时的总最大高度.我有一个解决方案，在大多数情况下都能给出不错的结果，但它并不总是准确的。我希望不必计算两组的每一个组合来获得理想的解决方案。

*我知道下面有关于从数组中删除项目并计算总和的优化。

// Input is always ordered by size
let inputWorking = [{
    item: 'a',
    size: 5
  },
  {
    item: 'b',
    size: 6
  },
  {
    item: 'c',
    size: 7
  },
  {
    item: 'd',
    size: 8
  },
];

let inputNotWorking = [{
    item: 'a',
    size: 14
  },
  {
    item: 'b',
    size: 14
  },
  {
    item: 'c',
    size: 20
  },
  {
    item: 'd',
    size: 20
  },
  {
    item: 'e',
    size: 21
  },
];


console.log('Output is correct', pushSides(inputWorking));

// Best fit would be [a,b,c], [d,e] with a max height of 48
console.log('Output is incorrect', pushSides(inputNotWorking));


function pushSides(items, ls = [], rs = []) {
  if (items.length === 0) {
    const leftSize = this.sumSizes(ls);
    const rightSize = this.sumSizes(rs);
    return {
      left: ls,
      right: rs,
      maxHeight: leftSize > rightSize ? leftSize : rightSize
    };
  }

  const lastItem = items[items.length - 1];
  const result = this.pushToIdealSide(lastItem, ls, rs);
  // Remove item we used
  items = items.filter(t => t !== lastItem);
  return pushSides(items, result.ls, result.rs);
}

function pushToIdealSide(nextItem, ls = [], rs = []) {
  if (this.sumSizes(rs) + nextItem.size > this.sumSizes(ls) + nextItem.size) {
    ls.push(nextItem);
  } else {
    rs.push(nextItem);
  }

  return {
    ls,
    rs
  };
}

function sumSizes(itemSizeArray) {
  return itemSizeArray.map(c => c.size)
    .reduce((prev, curr) => prev + curr, 0);
}

【问题讨论】：

想到的一种方法是找到绝对差最小的 2 个和的值，然后遍历数组并尝试取数字，使它们加起来等于 1这2个值。其余元素是您的第二个分区。
困难的部分是`尝试将数字加起来等于这两个值之一的总和`你最终会遇到同样的问题，不得不尝试很多组合。例如，如果您的一侧的总和是 11，而您有 4、4、2、11... 这将是一种蛮力方法，也许可以进行一些优化，但我最好创建所有可能的组合数组...不是吗？
据我了解，我们的分区不需要连续子阵列。所以我们可以排序并尝试使用 2 指针方法找到其中一个数字？我不确定我们是否可以贪婪地做到这一点。我们确定数组的一个子集加起来就是其中一个数字这一事实可能会有所帮助。
您可以使用动态规划来解决子集和问题，前提是所有高度的总和是一个相当小的整数。
这对geeksforgeeks.org/…有帮助吗？

标签： algorithm set partitioning

【解决方案1】：

这是partition problem 的函数版本，因此它是 FNP 完全的（因此至少与 NP 完全一样难）。您还可以将问题表述为 Subset-Sum 的函数版本。幸运的是，它具有伪多项式解，并且在实践中通常可以快速求解。

考虑问题的等价形式会更有用：给定整数集S，其和为T，找到S 的子集，其和至多为T/2，并尽可能接近T/2 尽可能。这个子集只是你的问题中总和较小的子集，所以另一个子集是S 的其余部分。

给定一种算法（例如您在另一篇文章中链接的算法），该算法仅找到最佳和最多一半（或等效地，最小绝对差），通常有一种直接的修改方法它来获取实际的子集。在生成子集和列表时，还要存储生成该子集和的元素的索引。然后，最后，我们可以从最终找到的总和中回溯以恢复元素。

举个简单的例子，如果我们的数组是 [1, 5, 8]，我们可能会通过一次添加一个元素来生成所有子集和：

Sums-Dict: Hashmap from subset-sums to last added element for that sum.
Initial Sums: {0: null}

Add element 1:
Sums-Dict = {0: null, 1: 1}

Add element 5:
Sums-Dict = {0: null, 1: 1, 5: 5, 6: 5}

Add element 8:
Sums-Dict = {0: null, 1: 1, 5: 5, 6: 5, 8: 8, 9: 8, 13: 8, 14: 8}

然后回溯，我们使用类似于背包问题中的回溯的过程来输出解：

Find a closest sum to (1+5+8)/2, for example '6'.

Backtracking:
Used-elements = [], Current Sum = 6.
Sums-Dict[6] = 5: add 5 to used-elements, subtract 5 from current sum

Used-elements = [5], Current Sum = 1.
Sums-Dict[1] = 1: add 1 to used-elements, subtract 1 from current sum


Used-elements = [5, 1], Current Sum = 0.
Sums-Dict[0] = null: stop. We have found the subset that summed to '6'.

可以对经典的动态规划解决方案进行简单修改，以存储每个总和的额外信息；如果数字很小，这是你最好的选择。我已经包含了一个基于the Schroeppel-Shamir 论文的核心算法和符号的子集和的 Python 实现。这是对子集和的天真包含/排除解决方案的中间相遇版本。它比蛮力方法更复杂，但运行时间为O(2^(n/2) * n/4) 并占用O(2^(n/4)) 空间，因此对于更大的输入来说是一个实用的解决方案。

from typing import Dict, List, Tuple
import collections
import math
import heapq


class SubsetSumSolver:
    """Solves the subset-sum and partition optimization problems.
    Useful when values or goal sum are too large for dynamic programming"""

    def __init__(self, nums: List[int]):

        # Strip all zeros. Not necessary, but a useful optimization for speed
        self.orig_zeros = nums.count(0)
        self.nums = sorted(x for x in nums if x != 0)

        self.n = len(self.nums)

    def all_subset_sums(self, left_bound: int, right_bound: int) -> Dict[int, int]:
        """Return a subset-sum dictionary, mapping subset-sums of
        nums[left_bound:right_bound] to any index of an element in that subset-sum."""

        all_sums = {0: self.n + 1}
        for i in range(left_bound, right_bound):
            # Want old sums to remain/take priority
            new_sums = {self.nums[i] + elem: i for elem in all_sums}
            new_sums.update(all_sums)
            all_sums = new_sums

        return all_sums

    def recover_sum_members(self, sum_dict: Dict[int, int], found_sum: int) -> List[int]:
        """Given a subset-sum dictionary and a sum, return a set of elements
        from nums that formed that sum."""

        answer = []
        curr_sum = found_sum
        while curr_sum != 0:
            next_elem_index = sum_dict[curr_sum]
            next_elem = self.nums[next_elem_index]
            answer.append(next_elem)
            curr_sum -= next_elem

            assert len(answer) <= self.n

        return answer

    def min_absolute_difference(self, goal: float) -> List[int]:
        """Implement Schroeppel and Shamir alg. for subset sum
        Runs in O(2^(n/2) * n/4) time, takes O(2^(n/4)) space
        Returns a subset of self.nums whose sum is as close to goal as possible.
        """

        if self.n < 8:
            # Direct solution when n < 8; not worth splitting into 4 groups.
            all_sums_dict = self.all_subset_sums(0, self.n)
            best_diff_seen, best_sum = min(((abs(x - goal), x) for x in all_sums_dict),
                                           key=lambda pair: pair[0])
            return self.recover_sum_members(all_sums_dict, best_sum)

        # Split nums into 4 equal-length parts (or as close as possible)
        half = self.n // 2
        bounds = [0, half // 2, half, half + (self.n - half) // 2, self.n]
        # first_arr = nums[bounds[0]:bounds[1]]
        # sec_arr = nums[bounds[1]:bounds[2]]
        # third_arr = nums[bounds[2]:bounds[3]]
        # fourth_arr = nums[bounds[3]:bounds[4]]

        first_table_dict = self.all_subset_sums(bounds[0], bounds[1])
        first_table = list(first_table_dict.keys())

        sec_table_dict = self.all_subset_sums(bounds[1], bounds[2])
        sec_table = sorted(sec_table_dict.keys())

        third_table_dict = self.all_subset_sums(bounds[2], bounds[3])
        third_table = list(third_table_dict.keys())

        fourth_table_dict = self.all_subset_sums(bounds[3], bounds[4])
        fourth_table = sorted(fourth_table_dict.keys(), reverse=True)

        # min_heap stores pairs of problems from T1 and T2, and
        # makes the pair with smallest sum in front of the heap
        # Format: (sum, T1-index, T2-index) triplets
        min_heap = [(x + sec_table[0], i, 0) for i, x in enumerate(first_table)]

        # max_heap stores pairs of problems from T3 and T4, and
        # makes the pair with largest sum in front of the heap
        # Format: (-sum, T3-index, T4-index) triplets
        max_heap = [(-(x + fourth_table[0]), i, 0) for i, x in enumerate(third_table)]

        heapq.heapify(min_heap)
        heapq.heapify(max_heap)

        best_diff_seen = math.inf
        best_diff_indices = []

        while len(min_heap) != 0 and len(max_heap) != 0:
            smallest, p1, p2 = min_heap[0]
            largest, p3, p4 = max_heap[0]
            largest = -largest

            ans_here = smallest + largest
            if abs(goal - ans_here) < best_diff_seen:
                best_diff_seen = abs(goal - ans_here)
                best_diff_indices = (p1, p2, p3, p4)

            if ans_here <= goal:  # Want sum to increase, so increase p2
                heapq.heappop(min_heap)
                if p2 + 1 != len(sec_table):
                    heapq.heappush(min_heap,
                                   (first_table[p1] + sec_table[p2 + 1], p1, p2 + 1))
            else:  # Want sum to decrease. so increase p4
                heapq.heappop(max_heap)
                if p4 + 1 != len(fourth_table):
                    heapq.heappush(max_heap,
                                   (-(third_table[p3] + fourth_table[p4 + 1]), p3, p4 + 1))

        sum_ans = []
        p1, p2, p3, p4 = best_diff_indices

        sum_ans.extend(self.recover_sum_members(first_table_dict, first_table[p1]))
        sum_ans.extend(self.recover_sum_members(sec_table_dict, sec_table[p2]))
        sum_ans.extend(self.recover_sum_members(third_table_dict, third_table[p3]))
        sum_ans.extend(self.recover_sum_members(fourth_table_dict, fourth_table[p4]))

        return sum_ans

    def solve_partition(self) -> Tuple[List[int], List[int]]:
        """Return a partition of nums into (smaller_sum_set, larger_sum_set)
        Finds a partition whose sum-difference is minimum.
        """
        total_sum = sum(self.nums)
        frequency_counts = collections.Counter(self.nums)
        first_subset = self.min_absolute_difference(goal=total_sum / 2.0)
        if self.orig_zeros != 0:
            first_subset.extend([0] * self.orig_zeros)

        remaining_subset = frequency_counts - collections.Counter(first_subset)
        remaining_subset = list(remaining_subset.elements())
        if sum(first_subset) <= sum(remaining_subset):
            return (first_subset, remaining_subset)
        else:
            return (remaining_subset, first_subset)

您可以在任何整数数组（实际上最多 n=100 个元素）上这样调用它：

Solver = SubsetSumSolver([1, 5, 5, 6, 7, 10, 20])
print(Solver.solve_partition())

>>> ([10, 6, 5, 5, 1], [7, 20])

【讨论】：