找到具有最大函数值的列表的最佳组合答案

【问题标题】：Finding the best combination of lists with maximum function value找到具有最大函数值的列表的最佳组合
【发布时间】：2016-05-11 13:33:24
【问题描述】：

假设我有 N 个列表（向量），我想从中选择 x 个 1<x<[N]（x 不是预先确定的）所以我会得到 func(lists) 的最大值。

例如：

l1 = [3,4,7,-2]
l2 = [0.5,3,6,2.7]
l3 = [0,5,8,3.6]
mat = [l1, l2, l3]

result = maximize(func, mat)

def func(mat):
    # doing some math between lists. For Example:
    sum_list = list(mat[0])
    for li in mat[1:]:
        sum_list = map(operator.add, sum_list, li)
    accum_min_lst = []
    for i, val in enumerate(sum_list):
        x = sum_list[:i + 1]
        accum_min_lst.append(val - max(x))
    return min(accum_min_lst)

可能的结果：

[l1], [l2], [l3], [l1,l2], [l1,l3], [l2,l3], [l1,l2,l3]

如果我要编写一个简单的解决方案并运行所有组合，它将永远花费 2^N。

我正在尝试使用cvxpy 或scipy.optimize.minimize 寻找解决方案但我发现很难理解我需要使用哪种函数来解决我的问题，我想也许我应该尝试进化算法来找到一个近似的答案，或者我应该改用portfolio optimization。

【问题讨论】：

如果您对func() 一无所知，这是一个组合问题，正如您已经指出的那样。 func() 的示例将有助于了解是否可以做更多事情......
无论func() 的性质如何，这都是一个离散优化问题。 scipy.optimize.minimize 是针对连续问题的，所以在这里用处不大。 cvxpy 可能很有用，前提是您的成本函数实际上是凸的。如果它不是凸的，那么您将需要使用某种全局优化策略，例如模拟退火。
嗨 @ali_m 谢谢你的回答，我不确定我的函数是凸的还是凹的，我在 cvxpy 中看到了 advanced function section 但我认为我的函数更复杂或者可能是那里有几个功能的组合。

标签： python scipy mathematical-optimization discrete-mathematics cvxopt

【解决方案1】：

我选择使用我自己的Evolutionary algorithm 版本，它对我来说更直观，而且你可以玩弄人口规模、世代和突变概率：

from random import choice, random

def stack_overflow_example(self):
    def fitness(trial):
        trial_max = self.func(trial, mat)
        if trial_max > self.best_res:
            self.best_res = trial_max
            return trial_max
        else:
            return -sys.maxint

    def mutate(parent):
        mutation = []
        for p in parent:
            if random() < prob:
                mutation.append(choice([0, 1]))
            else:
                mutation.append(p)
        return mutation

    l1 = [3, 4, 7, -2]
    l2 = [0.5, 3, 6, 2.7]
    l3 = [0, 5, 8, 3.6]
    mat = [l1, l2, l3]

    max_num_of_loops = 1000
    prob = 0.075  # mutation probability
    gen_size = 10  # number of children in each generation
    self.bin_parent = [1] * len(mat)  # first parent all ones
    self.best_res = self.func(self.bin_parent, mat)  # starting point for comparison
    for _ in xrange(max_num_of_loops):
        backup_parent = self.bin_parent
        copies = (mutate(self.bin_parent) for _ in xrange(gen_size))
        self.bin_parent = max(copies, key=fitness)
        res = self.func(self.bin_parent, mat)
        if res >= self.best_res:
            self.best_res = res
            print (">> " + str(res))
        else:
            self.bin_parent = backup_parent
    print("Final result: " + str(self.best_res))
    print("Chosen lists:")
    chosen_lists = self.choose_strategies(self.bin_parent, mat)
    for i, li in enumerate(chosen_lists):
        print(">> list[{}] : values: {}".format(i, li))

def func(self, bin_list, mat):
    chosen_mat = self.bin_list_to_mat(bin_list, mat)
    if len(chosen_mat) == 0:
        return -sys.maxint
    # doing some math between lists:
    sum_list = list(chosen_mat[0])
    for li in chosen_mat[1:]:
        sum_list = map(operator.add, sum_list, li)
    accum_min_lst = []
    for i, val in enumerate(sum_list):
        x = sum_list[:i + 1]
        accum_min_lst.append(val - max(x))
    return min(accum_min_lst)

@staticmethod
def bin_list_to_mat(bin_list, mat):
    chosen_lists = []
    for i, stg in enumerate(mat):
        if bin_list[i] == 1:
            chosen_lists.append(stg)
    return chosen_lists

希望它能帮助某人:) 因为我花了一段时间才找到这个解决方案。

【讨论】：

请注意，这并不能保证找到最优解。
我知道，但是您可以添加更多代和/或增加种群规模，您可以添加所有类型的算法，如果 x 次迭代或更多次迭代的答案没有改变，您可以停止迭代然后 x 秒

【解决方案2】：

这可以表述为 MILP 并使用任何 MILP 求解器求解，但我在此处使用 PuLP 显示解决方案。

首先，让我们通过所有组合来看看示例问题的答案是什么：

import itertools

allfuncs = sum([[func(combs) for combs in itertools.combinations(mat, r)] for r in range(1, 4)], [])

max(allfuncs)

答案是-3.3

这个解决方案给出了相同的答案，并且应该扩展到更大的问题：

import pulp

prob = pulp.LpProblem("MaxFunc", pulp.LpMaximize)

allcols = range(0, len(l1))
allrows = range(0, len(mat))

# These will be our selected rows
rowselected = pulp.LpVariable.dicts('rowselected', allrows, cat=pulp.LpBinary)

# Calulate column sums (equivalent to sum_list in the example)
colsums = pulp.LpVariable.dicts('colsums', allcols, cat=pulp.LpContinuous)
for c in allcols:
    prob += colsums[c] == sum(mat[r][c]*rowselected[r] for r in allrows)

# This is our objective - maximimise this
maxvalue = pulp.LpVariable('maxvalue')
prob += maxvalue

# The tricky part - maximise subject to being less than each of these differences
# I'm relatively confident that all these constraints are equivalent
# to calculating the maximum and subtracting that
for c1 in allcols:
    for c2 in allcols[:c1]:
        prob += maxvalue <= colsums[c1] - colsums[c2]

# choose at least one row
prob += pulp.lpSum(rowselected) >= 1

prob.solve()

print(prob.objective.value())

for c in allrows:
    print(rowselected[c].value())

【讨论】：

嗨@chthonicdaemon 感谢您的回答，您的回答实际上是正确且有效的，您认为它会在正常时间（2~3 分钟）内对大约 100 个列表给出解决方案吗？
你能解释一下 prob+= 是做什么的吗？是否有一种更舒适的方法只是向 prob 添加完整功能，以便更容易更改内部功能？
MILP 在最坏的情况下仍然呈指数级扩展，但一个好的求解器几乎是解决这类问题的最快方法。还有其他的求解器可以选择，甚至像 cplex 这样的商业求解器，Wix 目前是最快的。
Pulp 允许人们通过添加方程式来构造问题，这就是 += 所做的。我建议通过纸浆文档了解更多信息。看来您不熟悉线性规划，所以我建议您也阅读一下。由于线性的限制，您不能以这种方式处理任意函数，只能处理可以用线性函数表示的函数。
感谢您的解释，我的问题是我的功能比这个示例中的功能更复杂，您给出的解决方案非常具体。我可能会阅读更多关于线性规划的内容，同时我选择使用我自己的Evolutionary algorithm 版本，它对我来说更直观。