【问题标题】:Performance Considerations when using VBA Filter Function使用 VBA 过滤功能时的性能注意事项
【发布时间】:2016-01-27 01:57:18
【问题描述】:

我无法弄清楚过滤器功能是如何运行得如此之快的。我已经在各种数据上使用了过滤器,无论数据类型如何,过滤器都会消除我使用的任何替代方法。我经常使用 Stephen Bullen 编写的二分搜索算法和 QuickArraySort 算法(在Professional Excel Development 中找到)。 Binary Search 速度极快(与 Filter 函数一样快,只要数组已排序),Quick Sort 算法是已知最快的排序算法之一。

我在下面编写了一些测试代码,比较了在非常大的数组(大小 = 2,000,000)中查找随机元素的速度。我故意以无序方式填充数组(需要注意的是,我尝试了各种无序分配方法,无论分配方法如何,结果都是相似的)。

Sub SearchTest()

Dim i As Long, strMyArray() As String, lngSize As Long, strTest As String
Dim TimeBinarySearch As Long, TimeFilterSearch As Long
Dim lngResultBinary As Long, lngResultFilter As Long

Dim StartHour As Long, StartMinute As Long, StartSecond As Long
Dim StartMiliSecond As Long, StartTime As Long

Dim EndHour As Long, EndMinute  As Long, EndSecond As Long
Dim EndMiliSecond As Long, EndTime As Long

    lngSize = 2000000

    strTest = CStr(1735674 * 987)

    ReDim strMyArray(lngSize)

    For i = 1 To UBound(strMyArray)
        If i Mod 2 = 0 Then
            strMyArray(i) = CStr((i - 1) * 987)
        Else
            strMyArray(i) = CStr((i + 1) * 987)
        End If
    Next i

''Filter Test
'*******************************************************************
    StartHour = Hour(Now()) * 60 * 60 * 1000
    StartMinute = Minute(Now()) * 60 * 1000
    StartSecond = Second(Now()) * 1000
    StartMiliSecond = Format(Now(), "ms")

    StartTime = StartHour + StartMinute + StartSecond + StartMiliSecond

    lngResultFilter = CLng(Filter(strMyArray, strTest)(0))

    EndHour = Hour(Now()) * 60 * 60 * 1000
    EndMinute = Minute(Now()) * 60 * 1000
    EndSecond = Second(Now()) * 1000
    EndMiliSecond = Format(Now(), "ms")

    EndTime = EndHour + EndMinute + EndSecond + EndMiliSecond

    TimeFilterSearch = EndTime - StartTime
'*******************************************************************

''Binary Test
'*******************************************************************
    StartHour = Hour(Now()) * 60 * 60 * 1000
    StartMinute = Minute(Now()) * 60 * 1000
    StartSecond = Second(Now()) * 1000
    StartMiliSecond = Format(Now(), "ms")

    StartTime = StartHour + StartMinute + StartSecond + StartMiliSecond

    QuickSortString1D strMyArray

    lngResultBinary = strMyArray(CLng(BinarySearchString(strTest, strMyArray)))

    EndHour = Hour(Now()) * 60 * 60 * 1000
    EndMinute = Minute(Now()) * 60 * 1000
    EndSecond = Second(Now()) * 1000
    EndMiliSecond = Format(Now(), "ms")

    EndTime = EndHour + EndMinute + EndSecond + EndMiliSecond

    TimeBinarySearch = EndTime - StartTime
'*******************************************************************

    MsgBox lngResultFilter & vbCr & vbCr & lngResultBinary 

    MsgBox TimeFilterSearch & vbCr & vbCr & TimeBinarySearch

End Sub

两种方法返回的结果相同,但Filter方法的返回时间为0毫秒,QuickSort/BinarySearch方法的返回时间接近20秒。这是一个巨大的差异!如前所述,如果数组已排序,则二分查找方法也会返回 0 ms(众所周知,二分查找要求数组以开始排序)

那么,Filter 函数如何查看 2,000,000 个未排序的条目并立即找到正确的结果呢?它不能简单地遍历每个条目并将其与过滤器值进行比较(这是迄今为止最慢的方法),但根据这些初步测试,它也不能使用二进制搜索,因为它必须排序首先是数组。即使已经编译了一个很棒的排序算法,我也很难相信它可以立即对大小超过一百万的数组进行排序。

顺便说一下,下面是快速排序算法和二分搜索算法。

    Sub QuickSortString1D(ByRef saArray() As String, _
                Optional ByVal bSortAscending As Boolean = True, _
                Optional ByVal lLow1 As Variant, _
                Optional ByVal lHigh1 As Variant)

    'Dimension variables
    Dim lLow2 As Long
    Dim lHigh2 As Long
    Dim sKey As String
    Dim sSwap As String

        On Error GoTo ErrorExit
        'If not provided, sort the entire array
        If IsMissing(lLow1) Then lLow1 = LBound(saArray)
        If IsMissing(lHigh1) Then lHigh1 = UBound(saArray)

        'Set new extremes to old extremes
        lLow2 = lLow1
        lHigh2 = lHigh1

        'Get value of array item in middle of new extremes
        sKey = saArray((lLow1 + lHigh1) \ 2)

        'Loop for all the items in the array between the extremes
        Do While lLow2 < lHigh2

            If bSortAscending Then
                'Find the first item that is greater than the mid-point item
                Do While saArray(lLow2) < sKey And lLow2 < lHigh1
                    lLow2 = lLow2 + 1
                Loop

                'Find the last item that is less than the mid-point item
                Do While saArray(lHigh2) > sKey And lHigh2 > lLow1
                    lHigh2 = lHigh2 - 1
                Loop
            Else
                'Find the first item that is less than the mid-point item
                Do While saArray(lLow2) > sKey And lLow2 < lHigh1
                    lLow2 = lLow2 + 1
                Loop

                'Find the last item that is greater than the mid-point item
                Do While saArray(lHigh2) < sKey And lHigh2 > lLow1
                    lHigh2 = lHigh2 - 1
                Loop

            End If

            'If the two items are in the wrong order, swap the rows
            If lLow2 < lHigh2 Then
                sSwap = saArray(lLow2)
                saArray(lLow2) = saArray(lHigh2)
                saArray(lHigh2) = sSwap
            End If

            'If the pointers are not together, advance to the next item
            If lLow2 <= lHigh2 Then
                lLow2 = lLow2 + 1
                lHigh2 = lHigh2 - 1
            End If
        Loop

        'Recurse to sort the lower half of the extremes
        If lHigh2 > lLow1 Then
            QuickSortString1D saArray, bSortAscending, lLow1, lHigh2
        End If

        'Recurse to sort the upper half of the extremes
        If lLow2 < lHigh1 Then
            QuickSortString1D saArray, bSortAscending, lLow2, lHigh1
        End If

    ErrorExit:

    End Sub

    '***********************************************************
    ' Comments: Uses a binary search algorithm to quickly locate
    ' a string within a sorted array of strings
    '
    ' Arguments: sLookFor The string to search for in the array
    ' saArray An array of strings, sorted ascending
    ' lMethod Either vbBinaryCompare or vbTextCompare
    ' Defaults to vbTextCompare
    ' lNotFound The value to return if the text isn’t
    ' found. Defaults to -1
    '
    ' Returns: Long The located position in the array,
    ' or lNotFound if not found
    '
    ' Date Developer Action
    ' ———————————————————————————————-
    ' 02 Jun 04 Stephen Bullen Created
    '
    Function BinarySearchString(ByRef sLookFor As String, _
                ByRef saArray() As String, _
                Optional ByVal lMethod As VbCompareMethod = vbTextCompare, _
                Optional ByVal lNotFound As Long = -1) As Long

    Dim lLow As Long
    Dim lMid As Long
    Dim lHigh As Long
    Dim lComp As Long

        On Error GoTo ErrorExit

        'Assume we didn’t find it
        BinarySearchString = lNotFound

        'Get the starting positions
        lLow = LBound(saArray)
        lHigh = UBound(saArray)

        Do
            'Find the midpoint of the array
            lMid = (lLow + lHigh) \ 2

            'Compare the mid-point element to the string being searched for
            lComp = StrComp(saArray(lMid), sLookFor, lMethod)

            If lComp = 0 Then
                'We found it, so return the location and quit
                BinarySearchString = lMid
                Exit Do
            ElseIf lComp = 1 Then
                'The midpoint item is bigger than us - throw away the top half
                lHigh = lMid - 1
            Else
                'The midpoint item is smaller than us - throw away the bottom half
                lLow = lMid + 1
            End If

            'Continue until our pointers cross
        Loop Until lLow > lHigh

    ErrorExit:

    End Function

编辑:看来我应该先做一些“蛮力”测试。正如 John Coleman 建议的 Filter 函数执行的那样,通过简单地以线性方式遍历数组,上述相同场景的返回时间为 0 ms。见下文:

Sub Test3()

Dim i As Long, strMyArray() As String, lngSize As Long, strTest As String
Dim lngResultBrute As Long, TimeBruteSearch As Long

    lngSize = 2000000
    strTest = CStr(936740 * 97)
    ReDim strMyArray(lngSize)

    For i = 1 To UBound(strMyArray)
        If i Mod 2 = 0 Then
            strMyArray(i) = CStr((i - 1) * 97)
        Else
            strMyArray(i) = CStr((i + 1) * 97)
        End If
    Next i

    StartTime = Timer

    ' Brute force search
    For i = 1 To UBound(strMyArray)
        If strMyArray(i) = strTest Then
            lngResultBrute = CLng(strTest)
            Exit For
        End If
    Next i

    EndTime = Timer

    TimeBruteSearch = EndTime - StartTime
    MsgBox TimeBruteSearch

End Sub

【问题讨论】:

  • 您正在观察编译的 C++ 代码和解释的 VBA 之间的区别。编译后的代码要快几个数量级。此外,与 VBA 相比,过滤器可以“作弊”,因为过滤器代码可以直接访问工作表值,而 VBA 则不能。当然,这并不是真正的作弊,但它确实为过滤器带来了巨大的优势。最后,过滤器经过高度优化,我怀疑它采用了它自己的幕后类型。
  • @ExcelHero 关于编译与解释的好点——但这似乎是关于比较不涉及对工作表值的任何访问的 VBA 的搜索时间。
  • @JohnColeman 确实如此。我没有仔细查看过错误地假设正在测试哪个过滤器的代码。现在我看到它是 VBA 的数组过滤器。我确实注意到 OP 使用了“条目”这个词,并以此作为我的答案。哦,那个过滤器也经过高度优化,毫无疑问利用了幕后排序。

标签: arrays excel algorithm vba sorting


【解决方案1】:

Filter 确实使用了线性搜索——它只是快速执行它,因为它是用高度优化的 C/C++ 代码实现的。要查看这一点,请运行以下代码:

Function RandString(n As Long) As String
    'returns a random string in B-Z
    Dim i As Long
    Dim s As String
    For i = 1 To n
        s = s & Chr(66 + Int(25 * Rnd()))
    Next i
    RandString = s
End Function

Sub test()
    Dim times(1 To 20) As Double
    Dim i As Long, n As Long
    Dim A() As String
    Dim start As Double
    Dim s As String
    Randomize
    s = RandString(99)
    ReDim A(1 To 2000000)
    For i = 1 To 2000000
        A(i) = s + RandString(1)
    Next i
    s = s & "A"
    For i = 20 To 1 Step -1
        n = i * 100000
        ReDim Preserve A(1 To n)
        start = Timer
        Debug.Print UBound(Filter(A, s)) 'should be -1
        times(i) = Timer - start
    Next i
    For i = 1 To 20
        Cells(i, 1) = i
        Cells(i, 2) = times(i)
    Next i
End Sub

此代码创建一个包含 2,000,000 个长度为 100 的随机字符串的数组,每个字符串都与最后一个位置的目标字符串不同。然后它将大小为 100,000 的倍数的子数组提供给Filter,计时它所花费的时间。输出如下所示:

明显的线性趋势并不能完全证明,但有力地证明了 VBA 的 Filter 正在执行简单的线性搜索。

【讨论】:

  • 非常聪明的分析!我添加了一列,将运行时间(即 B 列中的值)除以乘法因子(即 A 列中的值),每次结果值似乎都是常数,这进一步支持了您的主张。
【解决方案2】:

我相信您在这里比较的是苹果和橙子。看起来当您测试 Filter 函数时,您将 unordered 数组作为输入,然后使用 Filter 查找与测试值匹配的内容。直觉说这是 O(N) = 200 万次操作 --- 你测试每个数组元素一次。然后你就完成了。

当您使用自定义 VBA 函数进行过滤时,您首先排序,这是相当昂贵的 O(N * Log2(N)) = 2900 万。一旦对数组进行了排序,您确实会获得搜索有序数组的好处,即 O(Log2(N)) = 14。即使您大大加快了搜索速度,排序的代价也会杀死您。

希望对您有所帮助。

【讨论】:

  • 好的,抱歉我误会了。我认为解释为什么首先排序比线性搜索慢可能会有所帮助。
猜你喜欢
  • 2011-12-16
  • 1970-01-01
  • 1970-01-01
  • 2017-01-24
  • 1970-01-01
  • 2011-07-09
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
相关资源
最近更新 更多