组合列表以获取项目组的最有效方法 <= 但尽可能接近长度 X答案

【问题标题】：Most efficient way to combine list of lists to get groups of items <= but as close as possible to length X组合列表以获取项目组的最有效方法 <= 但尽可能接近长度 X
【发布时间】：2019-03-09 04:30:25
【问题描述】：

假设我有一个列表列表，例如：

 [ [ a1, a2, a3, a4, a5], [b1, b2, b3, b4, b5, b6, b7, b8], [c1, c2, c3, c4, c5, c6], [d1, d2, d3, d4] ]

比较所有列表项的长度并将它们组合成一个长度尽可能接近但小于或等于 x 的列表的最简单方法是什么？

所以对于上面的例子，x=12：

[ [a1, a2, a3, a4, a5, c1, c2, c3, c4, c5, c6], [b1, b2, b3, b4, b5, b6, b7, b8, d1, d2, d3, d4] ]

输出中各个组（例如 a、b、c...）的顺序并不重要，但不能分解各个组。

我知道我可以，例如，获取第一组的长度，然后按顺序获取每个后续组的长度，如果它们的 sum = x 然后弹出这些列表并将它们的项目附加到返回的列表中，如果然后不再检查每个组的长度之和是否=x-1，如果是，则弹出并追加，然后继续使用长度之和=x-2等，直到输入列表为空。

对于像给出的示例这样的小团体来说，这会很好，但是当输入列表的 len 变得非常大时呢？有没有更高效的方法/算法？

【问题讨论】：

我会开始构建一个数组长度的字典。像 {4:[3], 5:[0], 6:[2], 8:[1]...} 其中每个长度都是原始列表列表中索引列表的关键。然后我会分配最接近最大值的元素。然后逐步重复以尝试“填满”组。
这似乎类似于背包问题en.wikipedia.org/wiki/Knapsack_problem
见binning files into approximately equal sized directories
这被称为bin packing problem。 greedy solution 适用于大多数应用程序。
你如何选择是否制作长度列表 (12, 12, 1) vs (12, 6, 7) vs (12, 10, 3) vs ...？

标签： python algorithm list sorting

【解决方案1】：

这似乎是一个装箱问题。

我的greedy algorithm with swapping and pre-sorting 应该可以工作，但您需要调整子列表长度以适应图片高度，并根据您的 x 和项目的总长度计算初始 bin-count。

【讨论】：

我认为 Wiki 正是我想要的！修改后的第一次拟合减少应该可以满足我的需要，尽管我注意到他们引用了一篇包含数学最优解的论文：Benkő A., Dósa G., Tuza Z. (2010) Bin Packing/Covering with Delivery, Solved with the Evolution算法，Proceedings 2010 IEEE 第五届仿生计算国际会议：理论与应用，BIC-TA 2010，艺术。不。 5645312，第 298-302 页。

【解决方案2】：

这个解决方案不一定找到长度最优的解决方案，因为它只考虑加入两个列表，但它在 O(n) 中运行。

#!/usr/bin/env python3

import sys
import random

def foo(xs, n):
    bins = {}
    mid = (n+1)//2

    # bin list positions by length
    for i, x in enumerate(xs):
        bucket = len(x)
        if bucket > n:
            raise RuntimeError("invalid input")

        bins.setdefault(bucket, []).append(i)

    # take out ones that are the desired length already
    out = [xs[x] for x in bins[n]] if n in bins else []
    bins.pop(n, None)

    # find complements for the upper half of buckets
    for i in list(bins.keys()):
        if i < mid:
            continue

        candidates = sorted([x for x in list(bins.keys()) if x <= n-i], reverse=True)

        while i in bins and bins[i]:
            x = bins[i].pop()

            for j in list(candidates):
                if j not in bins or not bins[j]:
                    candidates = candidates[1:]
                    continue

                y = bins[j].pop()
                if not bins[j]:
                    del bins[j]

                out.append(xs[x] + xs[y])
                break
            else:
                # complement not found
                out.append(xs[x])


    # add lists with no complements from the lower half
    out += [xs[y] for ys in bins.values() for y in ys]

    return out

_check_n = 0
def check(n, xs):
    ys = list(foo(xs, n))

    try:
        for y in ys:
            assert len(y) <= n
        print(".", end="")
        n+=1
        if n % 10 == 0:
            sys.stdout.flush()
    except:
        print("n, xs =", (n, xs))
        print("ys =", ys)
        raise

if __name__ == "__main__":
    n, xs = 12, [ ["a1", "a2", "a3", "a4", "a5"], ["b1", "b2", "b3", "b4", "b5", "b6", "b7", "b8"], ["c1", "c2", "c3", "c4", "c5", "c6"], ["d1", "d2", "d3", "d4"] ]
    check(n, xs)

    cases = 10**4
    max_n = 10**2
    max_input_len = 10**3

    for i in range(cases):
        n = random.randint(1, max_n)
        xs = [[1] * random.randint(1, n) for j in range(random.randint(1, max_input_len))]
        check(n, xs)

【讨论】：