使用 VBA 从数组中删除重复项答案

【问题标题】：Remove duplicates from array using VBA使用 VBA 从数组中删除重复项
【发布时间】：2012-08-05 21:24:52
【问题描述】：

假设我在 Excel 2010 中有一个数据块，100 行乘 3 列。

C 列包含一些重复项，假设它开始为

1、1、1、2、3、4、5、……、97、98

使用 VBA，我想删除重复的行，所以我剩下 98 行和 3 列。

1、2、3、……、97、98

我知道 Excel 2010 中有一个按钮可以执行此操作，但它随后会干扰我的其余代码并给出不正确的结果。

另外，我想在数组中做，然后将结果粘贴到工作表上，而不是像Application.Worksheetfunction.countif(.....这样的方法

比如：

Dim myarray() as Variant
myarray=cells(1,1).Currentregion.value

Dim a as Long

For a=1 to Ubound(myarray,1)

    'something here to 

Next a

【问题讨论】：

由于 OP 想要一个接近 RemoveDuplicates 保留相关数组行的 VBA 解决方案，我发布了一个迟到的回复 ►“从数组中删除重复项（加上相关的行项）”

标签： excel vba

【解决方案1】：

I answered a similar question。这是我使用的代码：

Dim dict As Object
Dim rowCount As Long
Dim strVal As String

Set dict = CreateObject("Scripting.Dictionary")

rowCount = Sheet1.Range("A1").CurrentRegion.Rows.Count

'you can change the loop condition to iterate through the array rows instead
Do While rowCount > 1
  strVal = Sheet1.Cells(rowCount, 1).Value2

  If dict.exists(strVal) Then
    Sheet1.Rows(rowCount).EntireRow.Delete
  Else
    'if doing this with an array, then add code in the Else block
    ' to assign values from this row to the array of unique values
    dict.Add strVal, 0
  End If

  rowCount = rowCount - 1
Loop

Set dict = Nothing

如果要使用数组，则使用相同的条件 (if/else) 语句遍历元素。如果字典中不存在该项，则可以将其添加到字典中并将行值添加到另一个数组中。

老实说，我认为最有效的方法是调整从宏记录器获得的代码。您可以在一行中执行上述功能：

    Sheet1.UsedRange.RemoveDuplicates Columns:=3, Header:=xlYes

【讨论】：

这会从工作表中删除实际行，而问题是删除 VBA 中的重复项。此外，任何行删除都应始终自下而上进行，以避免跳过行
@brettdj 这两段代码都可以删除重复项。提问者想在 VBA 中执行此操作，最好使用数组（在这种情况下，他/她可以轻松地修改循环以遍历数组的行而不是范围，然后仅将唯一元素添加到单独的数组中）。如果您查看代码，您会发现行删除确实是自下而上（rowCount 递减）。 :)
那么问题的标题是VBA，从数组中删除重复项。您提出了一个很好的观点，即您的代码确实按照您使用 .Count 的方式自下而上，尽管在这种情况下，询问者可能需要指定第一次出现应该保持在顶部还是底部。
我稍微编辑了我的答案以反映数组请求。在问题内部，它变得模棱两可，用户表示希望执行单击“删除重复项”按钮所做的操作，但在 VBA 中（使用数组）。内置的删除重复项也可以自下而上工作。如果用户回来并想要一个更彻底的答案，或者发现这个不能帮助他们弄清楚，那么我可以提供更多。也可以随意编辑我的或添加您自己的。感谢您的批评！ :)
这个 Scripting.Dictionary 方法是否也适用于 Mac？

【解决方案2】：

Function eliminateDuplicate(poArr As Variant) As Variant
    Dim poArrNoDup()

    dupArrIndex = -1
    For i = LBound(poArr) To UBound(poArr)
        dupBool = False

        For j = LBound(poArr) To i
            If poArr(i) = poArr(j) And Not i = j Then
                dupBool = True
            End If
        Next j

        If dupBool = False Then
            dupArrIndex = dupArrIndex + 1
            ReDim Preserve poArrNoDup(dupArrIndex)
            poArrNoDup(dupArrIndex) = poArr(i)
        End If
    Next i

    eliminateDuplicate = poArrNoDup
End Function

【讨论】：

如果代码有问题，请不要像this revision那样编辑其他SO用户的答案，请发表评论并让他们知道。
你需要为 i 的每个循环设置 dupBool 为 false

【解决方案3】：

对@RBILLC 和@radoslav006 答案的改进，此版本在删除重复项的数组中搜索现有值，因此它搜索较少的值来查找重复项。

Public Function RemoveDuplicatesFromArray(sourceArray As Variant)
    Dim duplicateFound As Boolean
    Dim arrayIndex As Integer, i As Integer, j As Integer
    Dim deduplicatedArray() As Variant
    
    arrayIndex = -1
    deduplicatedArray = Array(1)

    For i = LBound(sourceArray) To UBound(sourceArray)
        duplicateFound = False

        For j = LBound(deduplicatedArray) To UBound(deduplicatedArray)
            If sourceArray(i) = deduplicatedArray(j) Then
                duplicateFound = True
                Exit For
            End If
        Next j

        If duplicateFound = False Then
            arrayIndex = arrayIndex + 1
            ReDim Preserve deduplicatedArray(arrayIndex)
            deduplicatedArray(arrayIndex) = sourceArray(i)
        End If
    Next i

    RemoveDuplicatesFromArray = deduplicatedArray
End Function

【讨论】：

【解决方案4】：

从一维数组中删除重复项的简单函数

Private Function DeDupeArray(vArray As Variant) As Variant
  Dim oDict As Object, i As Long
  Set oDict = CreateObject("Scripting.Dictionary")
  For i = LBound(vArray) To UBound(vArray)
    oDict(vArray(i)) = True
  Next
  DeDupeArray = oDict.keys()
End Function

编辑：

使用stdVBA（主要由我自己维护的一个库），您可以使用：

uniqueValues = stdEnumerator.CreateFromArray(myArray).Unique().AsArray()

【讨论】：

在CreateObject("Scripting.Dictionary") 失败。 MS Word v.16.32 (Office 365)、VBA v. 7.1。
@AndriyMakukha Mac?
这很酷。我不知道您可以简单地执行oDict(key) = value 将项目添加到字典中。 :o

【解决方案5】：

这是处理数组的另一种方法：

Sub tester()

    Dim arr, arrout
    
    arr = Range("A1").CurrentRegion.Value   'collect the input array
     
    arrout = UniqueRows(arr)                'get only unique rows
    
    Range("H1").Resize(UBound(arrout, 1), UBound(arrout, 2)).Value = arrout
    
End Sub




Function UniqueRows(arrIn As Variant) As Variant
    Dim keys, rw As Long, col As Long, k, sep, arrout
    Dim dict As Object, lbr As Long, lbc As Long, ubr As Long, ubc As Long, rwOut As Long
    Set dict = CreateObject("scripting.dictionary")
    'input array bounds
    lbr = LBound(arrIn, 1)
    ubr = UBound(arrIn, 1)
    lbc = LBound(arrIn, 2)
    ubc = UBound(arrIn, 2)
    ReDim keys(lbr To ubr)
    'First pass:collect all the row "keys" in an array 
    '    and unique keys in a dictionary
    For rw = lbr To ubr
        k = "": sep = ""
        For col = lbc To ubc
            k = k & sep & arrIn(rw, col)
            sep = Chr(0)
        Next col
        keys(rw) = k     'collect key for this row
        dict(k) = True   'just collecting unique keys
    Next rw

    'Resize output array to # of unique rows
    ReDim arrout(lbr To dict.Count + (lbr - 1), lbc To ubc)
    rwOut = lbr
    'Second pass: copy each unique row to the output array
    For rw = lbr To ubr
        If dict(keys(rw)) Then      'not yet output?
            For col = lbc To ubc    'copying this row over to output...
                arrout(rwOut, col) = arrIn(rw, col)
            Next col
            rwOut = rwOut + 1      'increment output "row"
            dict(keys(rw)) = False 'flag this key as copied
        End If
    Next rw
    UniqueRows = arrout
End Function

【讨论】：

【解决方案6】：

@RBILLC 的回答可以通过在内部循环中添加 Exit For 来轻松改进：

Function eliminateDuplicate(poArr As Variant) As Variant
    Dim poArrNoDup()

    dupArrIndex = -1
    For i = LBound(poArr) To UBound(poArr)
        dupBool = False

        For j = LBound(poArr) To i
            If poArr(i) = poArr(j) And Not i = j Then
                dupBool = True
                Exit For
            End If
        Next j

        If dupBool = False Then
            dupArrIndex = dupArrIndex + 1
            ReDim Preserve poArrNoDup(dupArrIndex)
            poArrNoDup(dupArrIndex) = poArr(i)
        End If
    Next i

    eliminateDuplicate = poArrNoDup
End Function

【讨论】：

【解决方案7】：

我认为这确实是使用excel的本机函数的情况，至少对于初始数组获取而言，我认为没有更简单的方法可以做到这一点。这个 sub 将输出从第 5 列开始的唯一值。我假设目标范围是空的，所以如果不是，请更改 r 和 c。

Sub testUniques()
    
    Dim arr, r As Long, c As Long, h As Long, w As Long
    Dim this As Worksheet: Set this = ActiveSheet
    arr = Application.Unique(this.Cells(1, 1).CurrentRegion)
    
    r = 1
    c = 5
    h = UBound(arr, 1) - 1
    w = UBound(arr, 2) - 1
    
    this.Range(this.Cells(r, c), this.Cells(r + h, c + w)) = arr
    
End Sub

【讨论】：

【解决方案8】：

我知道这已经过时了，但这是我用来将重复值复制到另一个范围的东西，以便我可以快速查看它们，从而为我从各种电子表格中建立的数据库建立数据完整性。要使该过程删除重复项，只需将dupRng 行替换为Cell.Delete Shift:=xlToLeft 或类似的东西即可。

我没有亲自测试过，但它应该可以工作。

Sub PartCompare()
    Dim partRng As Range, partArr() As Variant, i As Integer
    Dim Cell As Range, lrow As Integer

    lrow = ThisWorkbook.Worksheets("Sheet1").Cells(Rows.Count, 1).End(xlUp).Row
    i = 0

    Set partRng = ThisWorkbook.Worksheets("Sheet1").Range(Cells(1, 1), Cells(lrow, 1))

    For Each Cell In partRng.Cells
        ReDim Preserve partArr(i)
        partArr(i) = Cell.Value
        i = i + 1
    Next

    Dim dupRng As Range, j As Integer, x As Integer, c As Integer

    Set dupRng = ThisWorkbook.Worksheets("Sheet1").Range("D1")

    x = 0
    c = 1
    For Each Cell In partRng.Cells
        For j = c To UBound(partArr)
            If partArr(j) = Cell.Value Then
                dupRng.Offset(x, 0).Value = Cell.Value
                dupRng.Offset(x, 1).Value = Cell.Address()
                x = x + 1
                Exit For
            End If
        Next j
        c = c + 1
    Next Cell
End Sub

【讨论】：

【解决方案9】：

从数组中删除重复项（以及相关的行项）

由于 OP 想要一个接近 RemoveDuplicates 的 VBA 解决方案，我演示了一种使用 ►dictionary 的数组方法来获取唯一项目本身 (dict.keys)，而是相关的 第一次出现的行索引 (dict.items)。

这些用于通过过程 LeaveUniques 保留整行数据，这得益于 ►Application.Index() 函数的高级可能性 - c.f. Some peculiarities of the the Application.Index function

调用示例

Sub ExampleCall()
'[0]define range and assign data to 1-based 2-dim datafield
    With Sheet1                   ' << reference to your project's sheet Code(Name)
        Dim lastRow: lastRow = .Cells(.Rows.Count, "C").End(xlUp).Row
        Dim rng:  Set rng = .Range("C2:E" & lastRow)
    End With
    Dim data: data = rng        ' assign data to 2-dim datafield
'[1]get uniques (column 1) and remove duplicate rows
    LeaveUniques data           ' << call procedure LeaveUniques (c.f. RemoveDuplicates)
'[2]overwrite original range
    rng.Clear
    rng.Resize(UBound(data), UBound(data, 2)) = data
End Sub

程序LeaveUniques

Sub LeaveUniques(ByRef data As Variant, Optional ByVal colNum As Long = 1)
'Purpose: procedure removes duplicates of given column number in entire array
    data = Application.Index(data, uniqueRowIndices(data, colNum), nColIndices(UBound(data, 2)))
End Sub

帮助功能到LeaveUniques

Function uniqueRowIndices(data, Optional ByVal colNum As Long = 1)
'Purpose: return data index numbers referring to uniques
'a) set late bound dictionary to memory
    Dim dict As Object
    Set dict = CreateObject("Scripting.Dictionary")
'b) slice e.g. first data column (colNum = 1)
    Dim colData
    colData = Application.Index(data, 0, colNum)
'c) fill dictionary with uniques referring to first occurencies
    Dim i As Long
    For i = 1 To UBound(colData)
        If Not dict.exists(dict(colData(i, 1))) Then dict(colData(i, 1)) = i
    Next
'd) return 2-dim array of valid unique 1-based index numbers
    uniqueRowIndices = Application.Transpose(dict.items)
End Function

Function nColIndices(ByVal n As Long)
'Purpose: return "flat" array of n column indices, e.g. for n = 3 ~> Array(1, 2, 3)
    nColIndices = Application.Transpose(Evaluate("row(1:" & n & ")"))
End Function

【讨论】：