查找 n * m 数组的所有可能组合，不包括某些值答案

【问题标题】：Finding all possible combos for n * m array, excluding certain values查找 n * m 数组的所有可能组合，不包括某些值
【发布时间】：2021-10-26 09:57:26
【问题描述】：

我有一个大小可以变化的数组，有 n 列和 m 行，我需要为每个行/列组合找到一个元素的所有组合，但排除元素为零的任何组合。所以，在实践中，如果我有：

Row	Item1	Item2	Item3
1	A	B	C
2	D	E	F

我将有 2^3 = 8 种可能的组合：ABC、ABF、AEC、AEF、DBC、DBF、DEC、DEF。

但如果不是 BI 在第 1 行 Item2 中有一个零，我想从组合列表中排除该单元格（在上面的 bold 中），所以我最终会得到：AEC，AEF , DEC 和 DEF。

我发现了一些代码，可以为我提供固定数量的列 (Macro to make all possible combinations of data in various columns in excel sheet) 上的所有可能组合，但它没有考虑到可以改变维度的数组，或者上面的排除规则。

【问题讨论】：

是否可以将零定位在任何位置，是否必须从结果数组中排除包括零在内的所有可能组合？此外，包括您自己解决问题的尝试。
一列必须至少有一个非零值，除此之外，零可以放在任何地方。实际上，在运行代码之前会检查数据，因此可以假设满足上述条件。恐怕我还没有工作代码。一次，如果我这样做，我会发布它。
您是否考虑过编辑代码以满足您的需求？
我不认为嵌套循环方法真的适用于可变数量的行和列 - 我会使用某种形式的计数，基数等于行数（在本例中为 2，即二进制），因此每个数字选择从中绘制下一个字母的行（000=ABC，001=ABF 010=AEC 等）
在偶然发现这篇有用的帖子 link 之前我会同意这一点 - Tim Williams 的回复接近我想要实现的目标，但我仍然需要弄清楚如何减少数量通过排除那些有零的组合。

标签： arrays excel vba combinations

【解决方案1】：

我只是要发布简单（无零）情况的代码，以便您可以看到我要处理的地方（当然我已经意识到 Base 会切换到基数 11 以后的字母，所以这可能不是最聪明的方法:)）

Function ListCombos(r As Range)

    Dim s As String, result As String
    Dim arr()
    Dim j As Integer, offset As Integer
    Dim rows As Integer, cols As Integer
    Dim nComb As Long, i As Long
    
    rows = r.rows.Count
    cols = r.Columns.Count
    
    nComb = rows ^ cols
    ReDim arr(1 To nComb)
    
    For i = 1 To nComb
        s = Application.Base(i - 1, rows, cols)

        result = ""
        For j = 1 To cols
            offset = CInt(Mid(s, j, 1))
            result = result & r.Cells(1, 1).offset(offset, j - 1)
        Next j

        arr(i) = result
    Next i
    
    ListCombos = arr
End Function

这是版本跳过包含零的组合。方法是将非零值移动到保持数组的第一行，如果你从这样的东西开始

你让它看起来像这样

因此您不必生成或检查所有包含零的组合。

然后使用混合基数循环组合：

Option Explicit
Option Base 1

Function ListCombosWithZeroes(r As Range)
    Dim s As String, result As String
    Dim arr()
    Dim i As Integer, j As Integer, offset As Integer, count As Integer, carry As Integer, temp As Integer
    Dim rows As Integer, cols As Integer
    Dim nComb As Long, iComb As Long
    Dim holdingArr(20, 20) As String
    Dim countArr(20) As Integer
    Dim countUpArr(20) As Integer
    
    
    rows = r.rows.count
    cols = r.Columns.count
    
    ' Move non-zero cells to first rows of holding array and establish counts per column
    
    For j = 1 To cols
        count = 0
        For i = 1 To rows
            If r.Cells(i, j) <> 0 Then
                count = count + 1
                holdingArr(count, j) = r.Cells(i, j)
            End If
        Next i
        countArr(j) = count
    Next j
                
    
    ' Calculate number of combos
    
    nComb = 1
    For j = 1 To cols
        nComb = nComb * countArr(j)
    Next j
        
    ReDim arr(1 To nComb)
    
    'Loop through combos
    
    For iComb = 1 To nComb

        result = ""
        For j = 1 To cols
            offset = countUpArr(j)
            result = result & holdingArr(offset + 1, j)
        Next j

        arr(iComb) = result
        
        'Increment countup Array - this is the hard part.
        
        j = cols
        
        'Set carry=1 to force increment on right-hand column
        
        carry = 1
        
        Do

            temp = countUpArr(j) + carry
            countUpArr(j) = temp Mod countArr(j)
            carry = temp \ countArr(j)
            j = j - 1
            
        Loop While carry > 0 And j > 0

    Next iComb
    
    ListCombosWithZeroes = arr

End Function

您不必每列有相同数量的字母。

【讨论】：

谢谢你，Tom - Application.Base 的一个有趣用途，直到现在我才遇到这个。上面的解决方案会在您标记时生成所有可能的组合，包括出现零的位置（组合只会省略该元素，因此我有 AF 而不是 ABF）。但是维度问题仍然存在：有没有办法在消除包含零的组合后获得所有可能的组合？
我的想法，对于它的价值，将所有非零值洗牌到二维数组的顶部，并使用混合基数处理那些，以便只生成那些你需要。我会试一试，但可能需要一点时间。但这并不是您在上一条评论中所说的，该组合将被完全省略（如您的问题中的 ABC、ABF、DBE、DBF），而不仅仅是缩短
抱歉，如果我不清楚 - 总之，我什至不关心会捕获零的组合。我想在获得所有可能的组合之前消除它们，以减少问题的维度。
版本跳过与零张贴的组合。
Tom，非常感谢你——这可以解决问题，尽管一旦我得到一个 10^9 的数组，即使在消除了零之后，它仍然会爆炸。该代码完美运行，只是导致 Excel 崩溃的大量计算。我想我需要在启动执行之前预先确定可能的最大连击数，然后才能继续。

【解决方案2】：

这里有一个解决方案。可能不是最有效的，因为它是 O(n2)，但它确实有效。

注意事项

我放了一个“。”而不是零以避免处理数字与字母数字值，但您可以轻松更改此设置
由于我以增量方式构建字符串，因此我需要可预测的索引。因此，我填写所有可能的组合，然后删除包含“。”的组合。第二遍

Global aws As Worksheet
Global ur As Range
Global ccount, rcount, size, rptline, rptblock, iblk, iln, idx As Integer
Global tempcombos(), combos() As String
Public Sub Calc_combos()
    Set aws = Application.ActiveSheet
    Set ur = aws.UsedRange
    ccount = ur.Columns.Count
    rcount = ur.Rows.Count
    size = (rcount - 1) ^ (ccount - 1)
    ReDim tempcombos(size - 1)
    ReDim combos(size - 1)
    rptline = size / (rcount - 1)
    rptblock = 1
    For c = 2 To ccount
        idx = 0
        For iblk = 1 To rptblock
            For r = 2 To rcount
                For iln = 1 To rptline
                    tempcombos(idx) = tempcombos(idx) & Cells(r, c)
                    idx = idx + 1
                Next iln
            Next r
        Next iblk
        rptline = rptline / (rcount - 1)
        rptblock = rptblock * (rcount - 1)
    Next c
    idx = 0
    For iln = 0 To size - 1
        If InStr(tempcombos(iln), ".") = 0 Then
            combos(idx) = tempcombos(iln)
            idx = idx + 1
        End If
    Next iln
End Sub

【讨论】：

感谢@gimix - 上面的代码确实按预期工作，但它的大小很快成为一个问题。正如您强调的那样，您首先构建所有组合，然后删除不需要的组合。这很快就会遇到 Excel 的溢出问题。我想得越多，我就越相信递归方法效果更好，不需要的组合从一开始就被排除在外。

【解决方案3】：

Python方式：

from dataclasses import dataclass, field
from itertools import product
from random import randint
from typing import Dict, List

@dataclass
class PriceComparison():
    rows : int
    cols : int
    maxprice : int = 50
    threshold : int = 0
    itemcodes : List[List[str]] = field(init=False)
    pricelist : Dict[str, int] = field(init=False)
    
    def __post_init__(self):
        ##create sample data
        self.itemcodes = [[f'A{r+self.cols*c:03d}' for c in range(self.rows)] for r in range(self.cols)]
        print(self.itemcodes)
        self.pricelist = {self.itemcodes[c][r]:randint(0,self.maxprice) for r in range(self.rows) for c in range(self.cols)}
        ##remove items with price = 0
        for col in self.itemcodes:
            for item in col[:]:
                if self.pricelist[item] == 0:
                    print(f'removing {item} from {col}')
                    col.remove(item)
                    del self.pricelist[item]

    def find_cheapest(self):
        iterations = 1
        for col in self.itemcodes:
            iterations *= len(col)
        print(f'this may require {iterations} iterations!')
        cheapest = self.maxprice * self.cols + 1
        for i, combo in enumerate(product(*self.itemcodes)):
            ##dummy price calculation
            price = sum([self.pricelist[item] for item in combo]) * randint(1,10) // 10
            if price < cheapest:
                print(f'current cheapest is {price} at iteration {i}')
                cheapest = price
                if price < self.threshold:
                    print('under threshold: returning')
                    break
        return cheapest

一些注意事项：

我认为最便宜的组合不是简单地通过在每一列中选择最便宜的项目来给出，否则我们不需要所有这些复杂的机器；所以我在计算组合的总价格时插入了一个随机系数 - 这应该替换为实际公式
我还假设我们的输入表中有项目代码，每个项目的价格都存储在其他地方。作为示例数据，我创建了从“A000”到“Axxx”的代码，并为每个代码分配了一个介于 0 和 maxprice 之间的随机价格
价格 = 0 的商品会立即被删除，在搜索最便宜的组合之前
对于大型输入表，搜索将花费很长时间。因此，尽管没有要求，我还添加了一个可选的 threshold 参数：如果我们发现总价低于该值，我们认为它足够便宜并停止搜索

编辑

以下是 Python 3.5 兼容版本。

但是必须注意，对于 10x15 的输入表，所需的迭代次数将接近 1E+15（实际上会更少，取决于我们能够忽略多少作为“明显异常值”的单元格）。即使我们每秒检查 100 万次连击，它仍然会运行（少于）1E+09 秒，或大约 32 年。

因此，我们需要一种方法来改进我们的策略。我集成了两个选项：

设置阈值，这样我们就不会搜索实际的最优价格，而是在找到“可接受”的价格后立即停止
在“区域”（列的子集）中拆分表格，为每个区域寻找最佳的部分解决方案，然后将它们组合起来。

示例运行：

##10 x 15, 5 zones, each 3 columns wide
this may require up to 1.000000e+03 iterations!
...
current best price is 1 at iteration 71 in 0.06 secs

this may require up to 1.000000e+03 iterations!
...
current best price is 2 at iteration 291 in 0.11 secs

this may require up to 1.000000e+03 iterations!
...
current best price is 1 at iteration 330 in 0.07 secs

this may require up to 8.100000e+02 iterations!
...
current best price is 4 at iteration 34 in 0.09 secs

this may require up to 1.000000e+03 iterations!
...
current best price is 1 at iteration 82 in 0.07 secs
['A000', 'A106', 'A017', 'A033', 'A139', 'A020', 'A051', 'A052', 'A008', 'A009', 'A055', 'A131', 'A147', 'A133', 'A044']

##10 x 15, no zones, threshold = 25
this may require up to 8.100000e+14 iterations!
...
current best price is 24 at iteration 267493282 in 1033.24 secs
under threshold: returning
['A000', 'A001', 'A002', 'A003', 'A004', 'A005', 'A051', 'A052', 'A008', 'A039', 'A055', 'A071', 'A042', 'A133', 'A044']

代码如下：

from itertools import product
from random import randint
from time import time

class PriceComparison():
    def __init__(self, rows, cols, zones = [], maxprice = 50, threshold = 0):
        self.rows = rows
        self.cols = cols
        if zones == []:
            self.zones = [cols]
        else:
            self.zones = zones
        self.maxprice = maxprice
        self.threshold = threshold
        self.__post_init__()
    
    def __post_init__(self):
        ##create sample data
        self.itemcodes = [['A%03d' % (r+self.cols*c) for c in range(self.rows)] for r in range(self.cols)]
        print(self.itemcodes)
        self.pricelist = {self.itemcodes[c][r]:randint(0,self.maxprice) for r in range(self.rows) for c in range(self.cols)}
        ##remove items with price = 0
        for col in self.itemcodes:
            for item in col[:]:
                if self.pricelist[item] == 0:
                    print('removing %s from %s' % (item, col))
                    col.remove(item)
                    del self.pricelist[item]

    def find_cheapest(self, lo, hi):
        iterations = 1
        for col in self.itemcodes[lo:hi]:
            iterations *= len(col)
        start = time()
        print('\nthis may require up to %e iterations!' % (iterations))
        bestprice = self.maxprice * self.cols + 1
        for i, combo in enumerate(product(*self.itemcodes[lo:hi])):
            ##dummy price calculation
            price = sum([self.pricelist[item] for item in combo]) * randint(1,10) // 10
            if price < bestprice:
                elapsed = time() - start
                print('current best price is %d at iteration %d in %.2f secs' % (price, i, elapsed))
                cheapest = combo
                bestprice = price
                if price < self.threshold:
                    print('under threshold: returning')
                    break
        return cheapest

    def find_by_zones(self):
        print(self.zones)
        fullcombo = []
        lo = 0
        for zone in self.zones:
            hi = lo + zone
            fullcombo += self.find_cheapest(lo, hi)
            lo = hi
        return fullcombo

【讨论】：

关于最便宜的组合，您是对的，每行都有限制，因此并非总是可以选择最便宜的价格。这是符合限制的最便宜的整体组合。我从原始解释中省略了这一点，因为我想保持问题简单。我会试试上面的。
编辑：我在一开始就遇到了一个错误（见下文）。当它将行声明为整数时。我正在使用 Python 3.5。有任何想法吗？文件“”，第 8 行：int ^ SyntaxError: invalid syntax
哦，对不起。 3.5 应该支持打字，但数据类不支持，所以我的代码无论如何都不能工作。稍后我会发布一个3.5兼容的版本