快速 python 算法从具有等于给定比率的子集和的数字列表中查找所有可能的分区答案

【问题标题】：Fast python algorithm to find all possible partitions from a list of numbers that has subset sums equal to given ratios快速 python 算法从具有等于给定比率的子集和的数字列表中查找所有可能的分区
【发布时间】：2023-11-05 16:59:01
【问题描述】：

假设我有一个从 0 到 9 的 20 个随机整数的列表。我想将列表划分为 N 子集，以便子集和的比率等于给定值，并且我想找到所有可能的分区。我编写了以下代码并使其适用于 N = 2 案例。

import random
import itertools

#lst = [random.randrange(10) for _ in range(20)]
lst = [2, 0, 1, 7, 2, 4, 9, 7, 6, 0, 5, 4, 7, 4, 5, 0, 4, 5, 2, 3]

def partition_sum_with_ratio(numbers, ratios):
    target1 = round(int(sum(numbers) * ratios[0] / (ratios[0] + ratios[1])))
    target2 = sum(numbers) - target1
    p1 = [seq for i in range(len(numbers), 0, -1) for seq in
          itertools.combinations(numbers, i) if sum(seq) == target1
          and sum([s for s in numbers if s not in seq]) == target2]

    p2 = [tuple(n for n in numbers if n not in seq) for seq in p1]

    return list(zip(p1, p2))

partitions = partition_sum_with_ratios(lst, ratios=[4, 3])
print(partitions[0])

输出：

((2, 0, 1, 2, 4, 6, 0, 5, 4, 4, 5, 0, 4, 5, 2), (7, 9, 7, 7, 3))

如果你计算每个子集的总和，你会发现比率是 44 : 33 = 4 : 3，这正是输入值。但是，我希望该函数适用于任意数量的子集。例如，我期望

partition_sum_with_ratio(lst, ratios=[4, 3, 3])

返回类似的东西

((2, 0, 1, 2, 4, 6, 0, 5, 4, 4, 3), (5, 0, 4, 5, 2, 7), (9, 7, 7))

我已经考虑这个问题一个月了，我发现这非常困难。我的结论是，这个问题只能通过递归来解决。我想知道是否有任何相对快速的算法。有什么建议吗？

【问题讨论】：

比例是多少？能否详细说明一下定义？
ratio = [4, 3] 表示2个子集之和的比例为4：3
如何在 3 个维度中工作？
你能重复元素吗？

标签： python algorithm recursion subset combinations

【解决方案1】：

是的，需要递归。基本逻辑是将二分法分成一个部分和其余部分，然后以所有可能的方式递归地拆分其余部分。我按照你的说法假设一切都是可区分的，这创造了很多可能性，可能太多而无法列举。尽管如此：

import itertools


def totals_from_ratios(sum_numbers, ratios):
    sum_ratios = sum(ratios)
    totals = [(sum_numbers * ratio) // sum_ratios for ratio in ratios]
    residues = [(sum_numbers * ratio) % sum_ratios for ratio in ratios]
    for i in sorted(
        range(len(ratios)), key=lambda i: residues[i] * ratios[i], reverse=True
    )[: sum_numbers - sum(totals)]:
        totals[i] += 1
    return totals


def bipartitions(numbers, total):
    n = len(numbers)
    for k in range(n + 1):
        for combo in itertools.combinations(range(n), k):
            if sum(numbers[i] for i in combo) == total:
                set_combo = set(combo)
                yield sorted(numbers[i] for i in combo), sorted(
                    numbers[i] for i in range(n) if i not in set_combo
                )


def partitions_into_totals(numbers, totals):
    assert totals
    if len(totals) == 1:
        yield [numbers]
    else:
        for first, remaining_numbers in bipartitions(numbers, totals[0]):
            for rest in partitions_into_totals(remaining_numbers, totals[1:]):
                yield [first] + rest


def partitions_into_ratios(numbers, ratios):
    totals = totals_from_ratios(sum(numbers), ratios)
    yield from partitions_into_totals(numbers, totals)


lst = [2, 0, 1, 7, 2, 4, 9, 7, 6, 0, 5, 4, 7, 4, 5, 0, 4, 5, 2, 3]
for part in partitions_into_ratios(lst, [4, 3, 3]):
    print(part)

【讨论】：

谢谢。当列表很小时，您的代码可以完美运行。问题是当我在大列表（len(lst) > 50）上使用代码时，它需要很长时间。我知道在这种情况下枚举是不可能的。无论如何要随机抽样一定数量的分区，子集总和等于一个比率？有重复的分区是可以的，无论如何这不太可能。
@ShaunHan 可能值得再问一个问题，因为这段代码不能很好地适应随机抽样。我的第一直觉是打乱数字，形成一个具有近似比率的贪婪分区，然后进行局部搜索移动以使比率更接近真实。
是的，贪心算法就可以了！我在这里开了另一个问题：*.com/questions/67939449/…如果你有贪心算法的想法，你可以在那里回答。