【问题标题】:Extract newest log lines from a log file based on timestamp on line start根据行开始的时间戳从日志文件中提取最新的日志行
【发布时间】:2017-06-25 08:24:19
【问题描述】:

我有一个简单的 .txt 日志文件,应用程序在其工作时会向其中添加行。这些行由时间戳和可变长度文本组成:

17-06-25 06:37:43 xxxxxxxxxxxxxxx
17-06-25 06:37:46 yyyyyyy
17-06-25 06:37:50 zzzzzzzzzzzzzzzzzzzzzzzzzzzz
...

我需要提取时间戳大于某个日期时间的所有行。这通常是最后的,比如 20-40 个日志条目(行)。

问题是,文件很大而且还在增长。

如果所有长度都相等,我会调用二进制搜索。但他们不是,所以我最终使用了类似的东西:

Private Sub ExtractNewestLogs(dEarliest As Date)
    Dim sLine As String = ""
    Dim oSRLog As New StreamReader(gsFilLog)

    sLine = oSRLog.ReadLine()
    Do While Not (sLine Is Nothing)
        Debug.Print(sLine)
        sLine = oSRLog.ReadLine()
    Loop
End Sub

这并不是很快。

有没有一种方法可以“向后”读取此类文件,即最后一行?如果没有,我还有什么其他选择?

【问题讨论】:

  • 你能让你的日志框架每天创建一个文件吗?
  • @Steve,感谢您的建议,但不,这不可行:需要修改其他应用程序。
  • 那就没什么可做的了。我建议编写某种服务,在午夜(或您喜欢的任何时候)读取该日志文件并将其按天拆分到一个单独的文件夹中。然后,您的应用将有更轻松的工作要做。 (它也可以在此之后将其重置为零,以便在几天后重新启动日志)
  • 这里是在特定时间运行的服务的示例:stackoverflow.com/questions/19151363/…
  • @Steve,再次感谢。你知道,我实际上正在考虑将此文件视为二进制文件,无论如何都要进行二进制搜索,从文件的中间长度位置开始,然后搜索下一个 CRLF+1 以获得(固定大小的)时间戳,并且只要两者相同...

标签: vb.net file search logging filestream


【解决方案1】:

下面的函数将使用二进制读取器将文件中最后一个x 字符数作为字符串数组返回。然后,您可以比读取整个日志文件更快地提取您想要的最后记录。您可以根据最后 20-40 个日志条目占用多少字节的粗略近似值来微调要读取的字节数。在我的电脑上 - 读取 17mb 文本文件的最后 10,000 个字符需要不到 10 毫秒。

当然,这段代码假定您的日志文件是纯 ascii 文本。

Private Function ReadLastbytes(filePath As String, x As Long) As String()
    Dim fileData(x - 1) As Byte
    Dim tempString As New StringBuilder
    Dim oFileStream As New FileStream(filePath, FileMode.Open, FileAccess.Read)
    Dim oBinaryReader As New BinaryReader(oFileStream)
    Dim lBytes As Long
    If oFileStream.Length > x Then
        lBytes = oFileStream.Length - x
    Else
        lBytes = oFileStream.Length
    End If
    oBinaryReader.BaseStream.Seek(lBytes, SeekOrigin.Begin)
    fileData = oBinaryReader.ReadBytes(lBytes)
    oBinaryReader.Close()
    oFileStream.Close()
    For i As Integer = 0 To fileData.Length - 1 
        If fileData(i)=0 Then i+=1
        tempString.Append(Chr(fileData(i)))
    Next
    Return tempString.ToString.Split(vbCrLf)
End Function

【讨论】:

    【解决方案2】:

    尽管文件没有静态行长,我还是尝试了二进制搜索。

    首先是一些注意事项,然后是代码:


    有时需要根据行首的升序排序键提取日志文件的最后 n 行。键实际上可以是任何东西,但在日志文件中通常表示日期时间,通常采用 YYMMDDHHNNSS 格式(可能带有一些插曲)。

    日志文件通常是基于文本的文件,由多行组成,有时有数百万行。日志文件通常具有固定长度的行宽,在这种情况下,通过二分搜索很容易访问特定的键。但是,日志文件可能也经常具有可变的行宽。要访问这些,可以使用平均线宽的估计值来计算文件末尾的位置,然后从那里依次处理到 EOF。

    但也可以对这种类型的文件采用二进制方法,如此处所示。一旦文件大小增加,优势就会出现。日志文件的最大大小由文件系统决定:理论上,NTFS 允许 16 EiB (16 x 2^60 B);实际上,在 Windows 8 或 Server 2012 下,它是 256 TiB (256 x 2^40 B)。

    (256 TiB 的实际含义:一个典型的日志文件被设计为可供人类阅读,并且每行很少超过 80 个字符。假设您的日志文件在令人惊讶的 12 年中愉快且完全不间断地记录总共 4,383 天,每次 86,400 秒,那么您的应用程序可以每毫秒将 9 个条目写入所述日志文件,最终在第 13 年达到 256 TiB 的限制。)

    二进制方法的最大优点是,对于由 2^n 字节组成的日志文件,n 次比较就足够了,随着文件大小的变大而迅速获得优势:而 1 KiB 的文件大小需要 10 次比较(1根据 102.4 B),1 MiB(每 50 KiB 1 个)只需要 20 次比较,1 GiB(每 33⅓ MiB 1 次)需要 30 次比较,1 TiB 大小的文件(每 25 GiB 1 次)只需 40 次比较。

    到函数。做了这些假设:日志文件以 UTF8 编码,日志行由 CR/LF 序列分隔,时间戳按升序位于每行的开头,格式可能为 [YY]YYMMDDHHNNSS,可能中间有一些插曲。 (所有这些假设都可以通过重载的函数调用轻松修改和维护。)

    在外循环中,二进制缩小是通过比较提供的最早日期时间来匹配的。一旦在二进制流中找到了一个新位置,就会在内部循环中进行独立的前向搜索,以定位下一个 CR/LF 序列。此序列之后的字节标记了正在比较的记录键的开始。如果此键大于或等于我们要搜索的键,则将其忽略。仅当找到的键小于我们正在搜索其位置的键时,才会将其视为我们想要的键之前的记录的可能条件。我们最终得到最大键小于搜索键的最后一条记录。

    最后,除了最终候选之外的所有日志记录都以字符串数组的形式返回给调用者。

    该功能需要导入System.IO。

    Imports System.IO
    
    'This function expects a log file which is organized in lines of varying
    'lengths, delimited by CR/LF. At the start of each line is a sort criterion
    'of any kind (in log files typically YYMMDD HHMMSS), by which the lines are
    'sorted in ascending order (newest log line at the end of the file). The
    'earliest match allowed to be returned must be provided. From this the sort
    'key's length is inferred. It needs not to exist neccessarily. If it does,
    'it can occur multiple times, as all other sort keys. The returned string
    'array contains all these lines, which are larger than the last one found to 
    'be smaller than the provided sort key.
    Public Shared Function ExtractLogLines(sLogFile As String,
        sEarliest As String) As String()
    
        Dim oFS As New FileStream(sLogFile, FileMode.Open, FileAccess.Read,
            FileShare.Read)             'The log file as file stream.
        Dim lMin, lPos, lMax As Long    'Examined stream window.
        Dim i As Long                   'Iterator to find CR/LF.
        Dim abEOL(0 To 1) As Byte       'Bytes to find CR/LF.
        Dim abCRLF() As Byte = {13, 10} 'Search for CR/LF.
        Dim bFound As Boolean           'CR/LF found.
        Dim iKeyLen As Integer = sEarliest.Length      'Length of sort key.
        Dim sActKey As String           'Key of examined log record.
        Dim abKey() As Byte             'Reading the current key.
        Dim lCandidate As Long          'File position of promising candidate.
        Dim sRecords As String          'All wanted records.
    
        'The byte array accepting the records' keys is as long as the provided
        'key.
        ReDim abKey(0 To iKeyLen - 1)   '0-based!
    
        'We search the last log line, whose sort key is smaller than the sort
        'provided in sEarliest.
        lMin = 0                        'Start at stream start
        lMax = oFS.Length - 1 - 2       '0-based, and without terminal CRLF.
        Do
            lPos = (lMax - lMin) \ 2 + lMin     'Position to examine now.
    
            'Although the key to be compared with sEarliest is located after
            'lPos, it is important, that lPos itself is not modified when
            'searching for the key.
            i = lPos                    'Iterator for the CR/LF search.
            bFound = False
            Do While i < lMax
                oFS.Seek(i, SeekOrigin.Begin)
                oFS.Read(abEOL, 0, 2)
                If abEOL.SequenceEqual(abCRLF) Then    'CR/LF found.
                    bFound = True
                    Exit Do
                End If
                i += 1
            Loop
            If Not bFound Then
                'Between lPos and lMax no more CR/LF could be found. This means,
                'that the search is over.
                Exit Do
            End If
            i += 2                              'Skip CR/LF.
            oFS.Seek(i, SeekOrigin.Begin)       'Read the key after the CR/LF
            oFS.Read(abKey, 0, iKeyLen)         'into a string.
            sActKey = System.Text.Encoding.UTF8.GetString(abKey)
    
            'Compare the actual key with the earliest key. We want to find the
            'largest key just before the earliest key.
            If sActKey >= sEarliest Then
                'Not interested in this one, look for an earlier key.
                lMax = lPos
            Else
                'Possibly interesting, remember this.
                lCandidate = i
                lMin = lPos
            End If
        Loop While lMin < lMax - 1
    
        'lCandidate is the position of the first record to be taken into account.
        'Note, that we need the final CR/LF here, so that the search for the 
        'next CR/LF sequence following below will match a valid first entry even
        'in case there are no entries to be returned (sEarliest being larger than
        'the last log line). 
        ReDim abKey(CInt(oFS.Length - lCandidate - 1))  '0-based.
        oFS.Seek(lCandidate, SeekOrigin.Begin)
        oFS.Read(abKey, 0, CInt(oFS.Length - lCandidate))
    
        'We're done with the stream.
        oFS.Close()
    
        'Convert into a string, but omit the first line, then return as a
        'string array split at CR/LF, without the empty last entry.
        sRecords = (System.Text.Encoding.UTF8.GetString(abKey))
        sRecords = sRecords.Substring(sRecords.IndexOf(Chr(10)) + 1)
    
        Return sRecords.Split(ControlChars.CrLf.ToCharArray(),
            StringSplitOptions.RemoveEmptyEntries)
    End Function
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2016-01-05
      • 1970-01-01
      • 1970-01-01
      • 2014-04-21
      • 1970-01-01
      相关资源
      最近更新 更多