【问题标题】：Count specific files in folder with excel vba使用 excel vba 计算文件夹中的特定文件
【发布时间】：2017-07-07 15:35:21
【问题描述】：

我需要一些关于我的 excel vba 的帮助。

首先让我告诉它应该做什么......

在网络文件夹中，有一些 pdf 文件应该被计算在内。文件夹如下所示：

X:/Tests/Manufact/Prod_1/Machine/Num/Year/Month/TEST_DDMMYYYY_TIMESTAMP.PDF
X:/Tests/Manufact/Prod_2/Machine/Num/Year/Month/TEST_DDMMYYYY_TIMESTAMP.PDF
X:/Tests/Manufact/Prod_3/Machine/Num/Year/Month/TEST_DDMMYYYY_TIMESTAMP.PDF

每年和每个月都有一个文件夹，其中的 pdf 根据创建日期进行排序。计数的文件应作为带有文件名和日期的列表列在活动工作表中。之后，我想计算在给定时间之间的特定日期创建了多少 pdf 文件。应该在像

这样的新工作表中

Date - Time-Period 1 (0AM-6AM) - Time Period 2 (6AM-10AM) - Time Period 3 (10AM - 12AM)

01.01.2017 - 12PDFs - 17PDFs - 11PDFs
02.01.2017 - 19PDFs - 21PDFs - 5PDFs

也许还有一种记忆的方式，所以脚本不再计算之前已经列出的所有文件？（因为有超过 100k 的 pdf，而且每天都在增加......）

所以...我在互联网上搜索了整整一周的解决方案，我找到了一些，最终得到了以下代码：

Sub ListFiles()
    Const sRoot     As String = "X:\Tests\Manufact\"
    Dim t As Date

    Application.ScreenUpdating = False
    With Columns("A:E")
        .ClearContents
        .Rows(1).Value = Split("File,Date,Day,Time,Size", ",")
    End With

    t = Timer
    NoCursing sRoot
    Columns.AutoFit
    Application.ScreenUpdating = True
    MsgBox Format(Timer - t, "0.0s")
End Sub

Sub NoCursing(ByVal sPath As String)
    Const iAttr     As Long = vbNormal + vbReadOnly + _
          vbHidden + vbSystem + _
          vbDirectory
    Dim col         As Collection
    Dim iRow        As Long
    Dim jAttr       As Long
    Dim sFile       As String
    Dim sName       As String

    If Right(sPath, 1) <> "\" Then sPath = sPath & "\"

    Set col = New Collection
    col.Add sPath

    iRow = 1

    Do While col.count
        sPath = col(1)

        sFile = Dir(sPath, iAttr)

        Do While Len(sFile)
            sName = sPath & sFile

            On Error Resume Next
            jAttr = GetAttr(sName)
            If Err.Number Then
                Debug.Print sName
                Err.Clear

            Else
                If jAttr And vbDirectory Then
                    If Right(sName, 1) <> "." Then col.Add sName & "\"
                Else
                    iRow = iRow + 1
                    If (iRow And &HFFF) = 0 Then Debug.Print iRow
                    Rows(iRow).Range("A1:E1").Value = Array(sName, _
                                                            FileDateTime(sName), _
                                                            FileDateTime(sName), _
                                                            FileDateTime(sName), _
                                                            FileLen(sName))
                End If
            End If
            sFile = Dir()
        Loop
        col.Remove 1
    Loop

End Sub

它的作用是计算目录中的所有文件（因此缺少告诉它只计算 PDF 的东西）。

它确实列出了我工作表中的文件，我对那部分很满意，但它只列出了它。我仍然需要排序部分，所以要么只让它计算日期和时间段，要么让它先计算/列出所有内容，然后排序并只计算列表中的日期和时间段（我真的不知道哪个会更好，也许有简单的方法和困难的方法？）

所以如果有人知道如何做到这一点，请告诉我，我非常感谢您的帮助！

最好的问候 - 一月

【问题讨论】：

我认为这就像在您的 Dir 命令中指定“*.PDF”一样简单，但是您使用的代码将文件和目录视为相同的东西，这样就可以仅查看带有.PDF 后缀的子文件夹。我认为this answer 中的代码对另一个问题更适合您。（您可能需要将strFolder = TrailingSlash(strFolder) 的行更改为If Right(strFolder, 1) <> "/" Then strFolder = strFolder & "/"。）

标签： vba excel pdf excel-2010

【解决方案1】：

好的，不久前我刚刚参与了一个类似的项目。我将在这里假设一些事情，你告诉我是否有任何事情会破坏整个系统。

1) 我们可以并且被允许在处理后将 .PDF 文件移动到子文件夹中，或者 2) 我们可以并且被允许重命名（甚至是临时的）.PDF 文件。

3) 如果我们过了一个月，我们就不需要再处理它了，例如今天是 2017 年 2 月，所以我们停止处理 2017 年 1 月的文件。

如果我们可以并且被允许继续进行这些假设，那么为了减少双重工作，一旦处理了 .PDF，它可以移动到当月文件夹中名为 Processed Files 的子文件夹，并在结束时如果该字符串永远不会出现在文件名中，我们可以将它们返回，或者通过附加一个特殊标签来重命名它，比如“ProOCed”，然后我们可以排除该新文件夹中或带有该标签的任何文件。

我建议您只需将所有文件名读入工作表，然后使用 Text-to-Columns 来获取文件创建的日期和时间，也许您可以使用 FileSystemObject 来获取该信息，然后只需使用 Excel 组功能即可按天和小时进行细分。

希望对您有所帮助，如果您需要任何代码示例，请告诉我。

【讨论】：

1) 和 2) 不幸的是，我不允许以任何方式重命名/移动/复制/更改文件，甚至是临时的。 3) 如果这意味着代码必须在 31.01 最新运行。对于 1 月的文件，这也是不可能的，因为可能存在没有人可以及时运行脚本并且我们通过 01.02 的情况。在运行脚本之前。关于你提到的最后一件事的一些代码很有趣，我没有像我想的那样深入编码，还在学习:)
OK 感谢您确认限制 Jan

【解决方案2】：

我会这样做。以下内容大部分未经测试并且应该真正被视为伪代码。除此之外不是很清楚我可以给出一个明确的答案，因为我也必须做出许多假设（即目录中的 Num 只是 'Num' 或者是它是一个数字，TIMESTAMP 是如何定义的，等等）。

我假设您的 pdf 文件将正确归档在正确的月份文件夹。即，例如，你不会有在“10”文件夹中说一个月“09”（这将是一个错误情况）。如果是这样的话我的提议应该可行。请注意，我还假设文件名是正确的。如果没有，您可以添加其他错误加工。现在，如果我在文件名中发现错误，我只是跳过它 - 但是您可能希望将其打印出来，如编码 cmets。

主要的数据结构是一个字典，最终应该有每个月的每一天的一天条目（即键，值）一次所有的pdf 月已处理完毕。这本词典的键是 2 位数字表示从 '01' 到 '31' 的日期的字符串（对于有 31 天）。该值是一个长度为 3 的一维数组。所以一个典型的条目可以是 (20,31,10)，即周期 1 的 20 个文件，周期 2 的 31 个文件和第 3 期为 10。

对于每个文件，您都会处理一个仅提取日期和小时的正则表达式。我假设期间时间不重叠（只是让事情变得更容易 - 即我不必费心几分钟）。提取后，我添加到根据我找到的小时数的正确时间段的那几天数组。

您应该注意，我假设您已经浏览了所有产品目录对于给定的月份，您现在拥有所有月份的文件。所以整个月您现在可以在不同的工作表上为每个文件打印出期间计数的文件天。

我没有费心实现“SummarizeFilesForMonth”，但这应该是一旦调试了其他所有内容，就相对简单了。这是您将以正确的顺序遍历日期键的地方打印出期间统计数据。除此之外不应该有任何其他附加排序。

Option Explicit

' Gets all files with the required file extension,
' strips off both the path and the extension and
' returns all files as a collection (which might not be
' what you want - ie might want the full path on the 1st sheet)
Function GetFilesWithExt(path As String, fileExt As String) As Collection
  Dim coll As New Collection
  Dim file As Variant
  file = dir(path)

  Dim fileStem As String, ext As String
  Do While (file <> "")
    ext = Right(file, Len(file) - InStrRev(file, "."))
    If ext = fileExt Then
      fileStem = Right(file, Len(file) - InStrRev(file, "\"))
      coll.Add Left(fileStem, Len(file) - 5)
    End If
    file = dir
  Loop

  Set GetFilesWithExt = coll
End Function


' Checks whether a directory exists or not
Function pathExists(path As String)
 If Len(dir(path, vbDirectory)) = 0 Then
   pathExists = False
 Else
   pathExists = True
 End If
End Function


' TEST_DDMMYYYY_TIMESTAMP is the filename being processed
' assuming TIMESTAMP is hr min sec all concatenated with
' no intervening spaces and all are always 2 digits
Sub UpdateDictWithDayFile(ByRef dictForMonth As Variant, file As String)
 Dim regEx As New RegExp

 ' only extracts day and hour - you'll almost certainly
 ' have to adjust this regular expression to suit your needs
 Dim mat As Object
 Dim Day As String
 Dim Hour As Integer
 regEx.Pattern = "TEST_(\d{2})\d{2}\d{4}_(\d{2})\d{2}\d{2}$"
 Set mat = regEx.Execute(file)
 If mat.Count = 1 Then
   Day = mat(0).SubMatches(0) ' day is a string
   Hour = CInt(mat(0).SubMatches(1)) ' hour is an integer
 Else
   ' Think about reporting an error here using debug.print
   ' i.e., the filename isn't in the proper format
   ' and will not be counted
   Exit Sub
 End If

 If Not dictForMonth.exists(Day) Then
   ' 1 dimensional array of 3 items; one for each time period
   dictForMonth(Day) = Array(0, 0, 0)
 End If

 Dim periods() As Variant
 periods = dictForMonth(Day)

 ' I'm using unoverlapping hours unlike what's given in your question
 Select Case Day
   Case Hour <= 6
    periods(0) = periods(0) + 1
   Case Hour >= 7 And Hour < 10
    periods(1) = periods(1) + 1
   Case Hour >= 10
    periods(2) = periods(2) + 1
   Case Else
     ' Another possible error; report on debug.print
     ' will not be counted
     Exit Sub
 End Select

End Sub


Sub SummarizeFilesForMonth(ByRef dictForMonth As Variant)
  ' This is where you write out the counts
  ' to the new sheet for the month.  Iterate through each
  ' day of the month in 'dictForMonth' and print
  ' out each of pdf counts for the individual periods
  ' stored in the 1 dimensional array of length 3
End Sub


Sub ProcessAllFiles()
 ' For each day of the month for which there are pdfs
 ' this dictionary will hold a 1 dimensional array of size 3
 ' for each
 Dim dictForMonth As Object

 Dim year As Integer, startYear As Integer, endYear As Integer
 Dim month As Integer, startMonth As Integer, endMonth As Integer
 Dim prodNum As Integer, startProdNum As Integer, endProdNum As Integer
 Dim file As Variant
 Dim files As Collection

 startYear = 2014
 startMonth = 1
 endYear = 2017
 endMonth = 2
 startProdNum = 1
 endProdNum = 3

 Dim pathstem As String, path As String
 pathstem = "D:\Tests\Manufact\Prod_"

 Dim ws As Worksheet
 Dim row As Integer
 Set ws = ThisWorkbook.Sheets("Sheet1")
 row = 1

 For year = startYear To endYear:
   For month = 1 To 12:
     Set dictForMonth = CreateObject("Scripting.Dictionary")

     For prodNum = startProdNum To endProdNum
       If prodNum = endProdNum And year = endYear And month > endMonth Then Exit Sub

       path = pathstem & prodNum & "\Machine\Num\" & year & "\" & Format(month, "00") & "\"
       If pathExists(path) Then
         Set files = GetFilesWithExt(path, "pdf")
         For Each file In files:
           ' Print out file to column 'A' of 'Sheet1'
           ws.Cells(row, 1).Value = file
           row = row + 1
           UpdateDictWithDayFile dictForMonth, CStr(file)
         Next
       End If

     Next prodNum
     SummarizeFilesForMonth dictForMonth
   Next month
 Next year

End Sub

【讨论】：

哇，到目前为止，这是很棒的工作！但是有些事情我们必须解决（因为我在我的帖子中搞砸了）首先有这个该死的文件夹结构（我自己不喜欢它但不能改变它）文件夹看起来像这样： X:/Tests/Manufact/{Product}/Machine/{Part}/Year/Month/ 其中 {Product} 是“L450”、“L460”或“L510”这三个名称之一，而 {Part} 的格式类似于 9P-306-01800-123，但在这样的文件夹中：{Part} in "L450" can be 9P-306-01800-123, 9P-306-03600-123, 9P-306-11800-123, 9P-306-13600-123
{Part} in "L460" can be 9P-308-01800-123, 9P-308-03600-123, 9P-308-11800-123, 9P-308-13600-123, 9P-308-21800-123, 9P-308-23600-123 {Part} in "L510" can be 9P-304-02100-123, 9P-304-02300-123, 9P-304-03400-123, 9P-304-04600-123, 9P-304-12100-123, 9P-304-12300-123, 9P-304-13400-123, 9P-304-14600-123 文件名是您想更好地了解的另一件事，所以我再次查看了它，它的构建方式完全一样：{Number}_0{Serial}_YYYYMMDD_HHMMSS.PDF
其中 {Number} 可以是这 18 个数字之一5140391, 5140392, 5140393, 5140394, 5140395, 5140396, 5140397, 5140398, 5142485, 5142487, 5142494, 5142762, 5142769, 5142770, 5142821, 5142822, 5144561, 5144562 而 {Serial} 是由四个字母和一个 6 位数字 (AAAA012345) 组成的，关于时代还有一件事。本来我打算数一些不同的东西，但现在看到这个东西，我必须像这样数三遍之间的文件：
11:01PM to 07:00AM (This would be 31.12.16 11:01PM to 01.01.17 07:00AM) 07:01AM to 03:00PM (This would be 01.01.17 07:01AM to 01.01.17 03:00PM) 03:01PM to 11:00PM (This would be 01.01.17 03:01PM to 01.01.17 11:00PM) 我假设您的脚本已经完成了我在几分钟（:00 和 :01）中提到的事情，只是这次要清楚。此外，第 02 个月的文件进入第 01 个月文件夹的可能性很小。我认为让脚本以某种方式计算这些错误排序的文件非常困难，如果可以的话，那就太好了，但如果不是，我不会关心它。
我希望我现在考虑了所有问题，如果你对如何编码不太了解，这很难说 - 我还在学习，但这个是（因为我'现在已经看到）比我想象的要多得多非常感谢您的帮助，我真的很期待您的回复:) PS：伙计，cmets 的 600 个字符限制杀死了我 :-D

【解决方案3】：

好的，感谢您确认 Jan

那么下一个选项是在工作表中构建一个已处理并传递它们的文件名列表，例如，如果您使用 For Each 循环来循环文件，将有一个测试来查看如果文件的当前名称在已处理文件的列表中，则跳过它，否则处理它并将其名称添加到列表中。

3 指过去一个月的所有文件。这样我们可以按日期搜索文件并获取要处理的新文件。因此，在某个日期（上次运行日期）之后生成的所有文件都将被视为新文件，需要进行处理。

这行得通吗？

【讨论】：