【问题标题】:Consolidating data from multiple columns合并来自多个列的数据
【发布时间】:2014-01-14 18:03:33
【问题描述】:

我的问题与此线程How to Consolidate Data from Multiple Excel Columns All into One Column 类似但更复杂。

这里是excel示例

Date       Measure1  A    B     Date       Measure2    A    B   C   Date.....
11/11/11   1234     1     2     11/12/12   5678        1    3   3   12/12/12  ...
12/11/12   234     34    234    12/12/13   345        342   23  33  12/12/13  ...
........

Excel 中有数百列。一个日期列,后跟一个度量列,然后是其他一些列。 现在我只想要日期列、度量名称列和值列。 结果excel文件应该像

Date      Measure Name      Value
11/11/11  Measure1          1234
11/12/12  Measure2          5678
12/12/12  ....
....
12/11/12  Measure1          234
12/12/13  Measure2          123

我如何通过 VBA 实现它?由于我有数千个这样的文件,VBA 似乎是整合这些文件并加载到数据库中的最佳方式。

我总是得到

  Run-time error '1004'
  Application -defined or object -defined eror"

  w.Sheets("DataSort").Range("A1").Resize(k, UBound(Arr2, 2)) = Arr2

这是我的代码

Sub convertExcel()
Dim Arr1, Arr2()
Dim Rnum As Integer, Cnum As Integer, Tnum As Integer
Dim i As Integer, j As Integer, k As Integer
'Rnum = row number; Cnum = column number; Tnum as  total number

Application.ScreenUpdating = False
Set w = Workbooks.Open("FileNAME~~~~")
Rnum = w.Sheets("Data").Cells(Rows.Count, 1).End(xlUp).Row

Cnum=208
Tnum = Rnum * Cnum / 2
w.Sheets.Add.Name = "DataSort"

Arr1 = Range("A1:GZ" & Rnum)
ReDim Arr2(1 To Tnum, 1 To 3)

For j = 2 To Cnum
  If w.Sheets("Data").Cells(1, j) = "Date" Then
     For i = 2 To Rnum
    If Arr1(i, j) <> "" Then
        k = k + 1:
        Arr2(k, 1) = Arr1(i, j)
        Arr2(k, 2) = Arr1(1, j)
        Arr2(k, 3) = Arr1(i, j + 1)
    End If
    Next
    End If
Next


w.Sheets("DataSort").Range("A1").Resize(k, UBound(Arr2, 2)) = Arr2

w.Close True
Application.ScreenUpdating = True
End Sub

【问题讨论】:

  • 你目前使用的代码是什么?这个网站不是一个代码编写服务,展示你迄今为止的努力对于获得更好和更有用的答案大有帮助。
  • @enderland 我的代码是手动读取列索引和测量名称的第一行。但是我的迭代目前不起作用。由于这些测量是保密的,所以我没有有价值的代码可以展示。
  • 你将很难获得别人看不到的代码的帮助..
  • 对于每个Measurex,日期的计数是相同的还是从MeasureMeasure 不一样?
  • 在 BK201 所说的之上,日期和度量列之间的列间隔是常数吗?

标签: vba excel


【解决方案1】:

由于我今天有很多时间,所以我决定在这里花一些时间。我觉得这有点挑战性,但最后,这只是对事件进行适当的排序。

以下是我采用的逻辑:

  • 删除所有非Date 和非MeasureX 列。
  • 将所有带有Measure 的列名存储在字典中(完全没有必要,但是,嘿,它很快)作为键。
  • 遍历第一个字典的键并创建第二个字典以将日期值对存储为键值对。
  • 每次迭代,我们都会在第二张表中打印出键值对。

请同时阅读代码中的所有 cmets。另外,请注意我在下面的设置。最后,在您的工作簿副本上进行测试。

设置:

Sheet2 中,我有一个未删节 数据集,该数据集大致从您的示例复制而来,包含 1508 列和 1500 行数据,不包括标题。删除不需要的列后,数据将减少到 734 列和 1500 行数据。在测试中,我的删除大约需要 11-13 秒。您的里程可能会有所不同。

使用此过滤后的数据,使用第二个字典处理它大约需要 8-9 秒才能完成。整个过程基本上在~20秒左右完成。

截图:

Sheet2(带有原始数据的工作表):

Sheet3(输出表):

代码:

Sub KamehameWave()

    Dim Sht2 As Worksheet, Sht3 As Worksheet
    Dim Dict As Object, Cell As Range
    Dim Dict2 As Object, Cell2 As Range
    Dim RngToDelete As Range

    Set Sht2 = ThisWorkbook.Sheets("Sheet2") 'Modify accordingly.
    Set Sht3 = ThisWorkbook.Sheets("Sheet3") 'Modify accordingly.

    Application.ScreenUpdating = False

    With Sht2
        '-----------------------------------BK201's Notes-----------------------------------'
        ' The following block will delete unneeded columns. Basically, it will only keep    '
        ' columns that either have "Date" or "MeasureX" in their headers. All else will be  '
        ' discarded. As said in the post, do this on a copy of your worksheet.              '
        '-----------------------------------BK201's Notes-----------------------------------'
        Start = Timer()
        For Each Cell In .Rows(1).Cells
            If InStr(1, Cell.Value, "Date") = 0 And InStr(1, Cell.Value, "Measure") = 0 Then
                If Not RngToDelete Is Nothing Then
                    Set RngToDelete = Union(RngToDelete, .Columns(Cell.Column))
                Else
                    Set RngToDelete = .Columns(Cell.Column)
                End If
            End If
        Next Cell
        RngToDelete.Delete
        Debug.Print Timer() - Start
        Start = Timer()
        '-----------------------------------BK201's Notes-----------------------------------'
        ' The following block will create a dictionary and store all the names of columns   '
        ' with "Measure" in them. This is just so you have a reference. An array or a       '
        ' collection will do as well. I prefer to use this though as I find it easier.      '
        '-----------------------------------BK201's Notes-----------------------------------'
        Set Dict = CreateObject("Scripting.Dictionary")
        For Each Cell In .Rows(1).Cells
            CheckIfMeasure = InStr(1, Cell.Value, "Measure")
            If CheckIfMeasure > 0 Then
                If Not Dict.Exists(Cell.Value) And Not IsEmpty(Cell.Value) Then
                    Dict.Add Cell.Value, Empty
                End If
            End If
        Next Cell
        '-----------------------------------BK201's Notes-----------------------------------'
        ' What we'll do next is to iterate over each "MeasureX" column. We'll iterate over  '
        ' the values on these columns and store them in a *second* dictionary, with their   '
        ' respective dates being the keys.                                                  '
        '-----------------------------------BK201's Notes-----------------------------------'
        For Each Key In Dict
            MColIndex = Application.Match(Key, .Rows(1), 0)
            MColLRow = .Cells(Rows.Count, MColIndex).End(xlUp).Row
            Set MCol = .Range(.Cells(2, MColIndex), .Cells(MColLRow, MColIndex))
            Set Dict2 = CreateObject("Scripting.Dictionary")
            For Each Cell2 In MCol
                If Not Dict2.Exists(Cell2.Value) And Not IsEmpty(Cell2.Value) Then
                    Dict2.Add Cell2.Offset(0, -1).Value, Cell2.Value
                End If
            Next Cell2
        '-----------------------------------BK201's Notes-----------------------------------'
        ' The final phase is to get the next empty row in the output sheet and dump all the '
        ' key-value pairs from our second dictionary there. Since we're also iterating      '
        ' through the keys of the first dictionary, the list will append properly to        '
        ' accommodate each key's own dictionary.                                            '
        '-----------------------------------BK201's Notes-----------------------------------'
            TColNRow = Sht3.Range("A" & Rows.Count).End(xlUp).Row + 1
            Sht3.Range("A" & TColNRow).Resize(Dict2.Count, 1).Value = Application.Transpose(Dict2.Keys)
            Sht3.Range("B" & TColNRow).Resize(Dict2.Count, 1).Value = Key
            Sht3.Range("C" & TColNRow).Resize(Dict2.Count, 1).Value = Application.Transpose(Dict2.Items)
        Next Key
        Debug.Print Timer() - Start
    End With

    Application.ScreenUpdating = True

End Sub

运行代码后的结果:

第一个数字是删除的运行时间,第二个是转置。考虑到我有 50 万个数据点,这还不错。数据排序由您决定。

如果这有帮助,请告诉我们。

【讨论】:

  • 加一个努力! :D 我的意思是应该这样做。还有轮廓清晰的 cmets。
  • @L42:谢谢。我可以在这里和那里看到一些改进,但代码足够健壮。希望它能正常工作,因为它仍然可以在 1 或 2 个条件下中断。
  • 哈哈也为子名哈哈...Sub KamehameWave()是的。顺便说一句,它应该工作。 :)
猜你喜欢
  • 1970-01-01
  • 2019-07-20
  • 1970-01-01
  • 1970-01-01
  • 2016-09-27
  • 2019-01-10
  • 1970-01-01
  • 2016-04-29
  • 1970-01-01
相关资源
最近更新 更多