【问题标题】:How to parse Apache Log File, Swiftly?如何快速解析 Apache 日志文件?
【发布时间】:2021-04-12 16:58:50
【问题描述】:

假设我有一个已拆分为字符串数组的日志文件。例如我这里有这些行。

123.4.5.1 - - [03/Sep/2013:18:38:48 -0600] "GET /products/car/ HTTP/1.1" 200 3327 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/29.0.1547.65 Safari/537.36"

123.4.5.6 - - [03/Sep/2013:18:38:58 -0600] "GET /jobs/ HTTP/1.1" 500 821 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.8; rv:23.0) Gecko/20100101 Firefox/23.0"

我可以通过典型的字符串操作来解析这些,但是我认为使用正则表达式可以更好地做到这一点。我试图遵循某人在python 中使用过的类似模式,但我不太明白。这是我的尝试。

这是模式: ([(\d.)]+) - - [(.?)] "(.?)" (\d+) - "(. ?)" "(.?)" 当我尝试使用它时,我找不到匹配项。

let lines = contents.split(separator: "\n")
            let pattern = "([(\\d\\.)]+) - - \\[(.*?)\\] \"(.*?)\" (\\d+) - \"(.*?)\" \"(.*?)\""
            let regex = try! NSRegularExpression(pattern: pattern, options: [])
            for line in lines {
                let range = NSRange(location: 0, length: line.utf16.count)
                let parsedData = regex.firstMatch(in: String(line), options: [], range: range)
                print(parsedData)
            }

如果我可以将数据提取到最好的模型中。我需要确保代码高效且快速,因为我可能需要考虑数千行代码。

预期结果

let someResult = (String, String, String, String, String, String) or 
let someObject: LogFile = LogFile(String, String, String...)

我会寻找将解析后的行分解成各个部分。 IP, OS, OS Version, Browser Browser Version 等。任何真正的数据解析就足够了。

【问题讨论】:

  • 对我来说这更像是一个 Apache 日志文件。
  • @MartinR 是的,这是我的错字。已更正。

标签: swift regex parsing


【解决方案1】:

对于您显示的示例,您能否尝试以下操作。

^((?:\d+\.){3}\d+).*?\[([^]]*)\].*?"([^"]*)"\s*(\d+)\s*(\d+)\s*"-"\s*"([^"]*)"$

Online demo for above regex

说明:为上述添加详细说明。

^(                   ##Starting a capturing group checking from starting of value here.
   (?:\d+\.){3}\d+   ##In a non-capturing group matching 3 digits followed by . with 1 or more digits
)                    ##Closing 1st capturing group here.
.*?\[                ##Matching non greedy till [ here.
([^]]*)              ##Creating 2nd capturing group till ] here.
\].*?"               ##Matching ] and non greedy till " here.
([^"]*)              ##Creating 3rd capturing group which has values till " here.
"\s*                 ##Matching " spaces one or more occurrences here.
(\d+)                ##Creating 4th capturing group here which has all digits here.
\s*                  ##Matching spaces one or more occurrences here.
(\d+)                ##Creating 5th capturing group here which has all digits here.
\s*"-"\s*"           ##Spaces 1 or more occurrences "-" followed by spaces  1 or more occurrences " here.
([^"]*)              ##Creating 6th capturing group till " here.
"$                   ##Matching " at last.

【讨论】:

  • 你显然是正则表达式之神。这个解决方案奏效了。
【解决方案2】:

正确的正则表达式模式是@RavinderSingh13 提供的模式,但是我还想添加我所做的以使其在我的代码中正常运行,以便其他人将来可以使用它,而无需搜索所有 StackOverflow 的答案。

我需要找到一种方法将 Apache 日志文件解析为 swift 中的可用对象。代码如下。

实现扩展

extension String {
    func groups(for regexPattern: String) -> [[String]] {
        do {
            let text = self
            let regex = try NSRegularExpression(pattern: regexPattern)
            let matches = regex.matches(in: text,
                                        range: NSRange(text.startIndex..., in: text))
            return matches.map { match in
                return (0..<match.numberOfRanges).map {
                    let rangeBounds = match.range(at: $0)
                    guard let range = Range(rangeBounds, in: text) else {
                        return ""
                    }
                    return String(text[range])
                }
            }
        } catch let error {
            print("invalid regex: \(error.localizedDescription)")
            return []
        }
    }
}

创建模型对象

class EventLog {
    let ipAddress: String
    let date: String
    let getMethod: String
    let statusCode: String
    let secondStatusCode: String
    let versionInfo: String
    
    init(ipAddress: String, date: String, getMethod: String, statusCode: String, secondStatusCode: String, versionInfo: String ){
        self.ipAddress = ipAddress
        self.date = date
        self.getMethod = getMethod
        self.statusCode = statusCode
        self.secondStatusCode = secondStatusCode
        self.versionInfo = versionInfo
    }
}

解析数据

我想指出,正则表达式模式返回一个 [[String]],因此您必须从返回的总体组中获取 subGroup。类似于解析 JSON。

func parseData() {
        let documentsUrl:URL =  FileManager.default.urls(for: .documentDirectory, in: .userDomainMask).first!
        let destinationFileUrl = documentsUrl.appendingPathComponent("logfile.log")
        
        do {
            let contents = try String(contentsOf: destinationFileUrl, encoding: .utf8)
            let lines = contents.split(separator: "\n")
            let pattern = "^((?:\\d+\\.){3,}\\d).*?\\[([^]]*)\\].*?\"([^\"]*)\"\\s*(\\d+)\\s+(\\d+)\\s*\"-\"\\s*\"([^\"]*)\"$"
            for line in lines {
                let group = String(line).groups(for: pattern)
                let subGroup = group[0]
                let ipAddress = subGroup[1]
                let date = subGroup[2]
                let getMethod = subGroup[3]
                let statusCode = subGroup[4]
                let secondStatusCode = subGroup[5]
                let versionInfo = subGroup[6]
                
                DispatchQueue.main.async {
                    self.eventLogs.append(EventLog(ipAddress: ipAddress, date: date, getMethod: getMethod, statusCode: statusCode, secondStatusCode: secondStatusCode, versionInfo: versionInfo))
                }
            }
        } catch {
            print(error.localizedDescription)
        }
    }

【讨论】:

    【解决方案3】:

    该模式没有匹配项,因为在连字符的位置,有 1+ 个数字。

    为了使模式更高效,您可以使用negated character class "([^"]*)" 来捕获除" 之间的" 之外的任何字符

    (\d+(?:\.\d+)+) - - \[([^\]\[]+)\] "([^"]*)" (\d+) (\d+) "[^"]+" "([^"]+)"
    
    • (\d+(?:\.\d+)+) 捕获组 1,匹配 1+ 个数字并重复 1+ 次 . 和 1+ 个数字
    • - - 字面上匹配
    • \[([^\]\[]+)\] 匹配[ 捕获除[] 组2 中的任何字符1+ 次并匹配]
    • "([^"]*)" 匹配" 捕获除" 之外的任何字符的1 次以上,并匹配"
    • (\d+) (\d+) 捕获组 4 和 5 匹配 1+ 个数字
    • "[^"]+" " 之前的机制相同,但仅匹配
    • "([^"]+)" 与第 6 组中 " 的先前机制相同

    Regex demo | Swift demo

    示例代码

    let pattern = #"(\d+(?:\.\d+)+) - - \[([^\]\[]+)\] "([^"]*)" (\d+) (\d+) "[^"]+" "([^"]+)""#
    let regex = try! NSRegularExpression(pattern: pattern, options: .anchorsMatchLines)
    let testString = #"123.4.5.1 - - [03/Sep/2013:18:38:48 -0600] "GET /products/car/ HTTP/1.1" 200 3327 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/29.0.1547.65 Safari/537.36""#
    let stringRange = NSRange(location: 0, length: testString.utf16.count)
    let matches = regex.matches(in: testString, range: stringRange)
    var result: [[String]] = []
    for match in matches {
        var groups: [String] = []
        for rangeIndex in 1 ..< match.numberOfRanges {
            groups.append((testString as NSString).substring(with: match.range(at: rangeIndex)))
        }
        if !groups.isEmpty {
            result.append(groups)
        }
    }
    print(result)
    

    输出

    [["123.4.5.1", "03/Sep/2013:18:38:48 -0600", "GET /products/car/ HTTP/1.1", "200", "3327", "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/29.0.1547.65 Safari/537.36"]]
    

    【讨论】:

    • 我不知道为什么我不能让这个模式工作。每当我尝试使用它时,都会抛出一个无效的正则表达式错误。
    • @xTwisteDx 我明白了,你必须转义字符类中的方括号\[([^\]\[]+)\]我已经更新了答案。
    猜你喜欢
    • 1970-01-01
    • 2021-09-24
    • 1970-01-01
    • 1970-01-01
    • 2011-01-08
    • 1970-01-01
    • 1970-01-01
    • 2014-03-23
    • 2010-12-27
    相关资源
    最近更新 更多