【问题标题】:Parse log file in python在python中解析日志文件
【发布时间】:2017-04-10 08:27:09
【问题描述】:

我得到了一个具有这样结构的日志文件,需要在 python 中解析它:

10.243.166.74, 10.243.166.74 - - [08/Feb/2017:16:33:26 +0100] "GET /script/header_footer.js?_=1486568008442 HTTP/1.1" 200 2143 "http://www.trendtron.com/popmenu/home" "Mozilla/5.0 (Windows NT 6.1; rv:31.0) Gecko/20100101 Firefox/31.0 K-Meleon/75.1"

我第一次做注册。表达,我得到的只有这个:

(.+?)\[(.*?)\] "(.*?)" (\d+) (\d+) "(.*?)" "(.*?)"

该代码构成 7 个字符串,但我需要更多。 期望的输出:

"10.243.166.74, 10.243.166.74"
"08/Feb/2017"
"16:33:26"
"+0100"
"GET /script/header_footer.js?_=1486568008442"
"HTTP/1.1"
"200"
"2143"
"http://www.trendtron.com/popmenu/home"
"Mozilla/5.0"
"(Windows NT 6.1; rv:31.0)"
"Gecko/20100101"
"Firefox/31.0"\
"K-Meleon/75.1"

【问题讨论】:

    标签: python regex parsing logfile


    【解决方案1】:

    为什么不直接用空格分割最后一组?

    import re
    log = '10.243.166.74, 10.243.166.74 - - [08/Feb/2017:16:33:26 +0100] "GET /script/header_footer.js?_=1486568008442 HTTP/1.1" 200 2143 "http://www.trendtron.com/popmenu/home" "Mozilla/5.0 (Windows NT 6.1; rv:31.0) Gecko/20100101 Firefox/31.0 K-Meleon/75.1"'
    
    regex = re.compile('(.+?)\[(.*?)\] "(.*?)" (\d+) (\d+) "(.*?)" "(.*?)"')
    res = regex.match(log)
    log_parts = list(res.groups())
    devices_browsers_info_str = log_parts.pop(-1)
    devices_browsers_info_parts = devices_browsers_info_str.split(' ')
    log_parts.extend(devices_browsers_info_parts)
    

    给我们

    ['10.243.166.74, 10.243.166.74 - - ', 
     '08/Feb/2017:16:33:26 +0100', 
     'GET /script/header_footer.js?_=1486568008442 HTTP/1.1', 
     '200', '2143', 'http://www.trendtron.com/popmenu/home',
     'Mozilla/5.0',
     '(Windows', 'NT', '6.1;', 'rv:31.0)',
     'Gecko/20100101', 
     'Firefox/31.0', 
     'K-Meleon/75.1']
    

    【讨论】:

    • 非常感谢。正如我所说,这是我第一次做正则表达式。谢谢:)
    【解决方案2】:
    (.+?)\- - \[(.+?)\:(.+?)\ (.+?)\] \"(.+?)\ (HTTP.+?)\" (.+?) (.+?) \"(.+?)\" \"(.+?) (.+?\)) (.+?)\ (.+?)\ (.+?)\"
    

    或:http://regexr.com/3fndb

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2019-08-12
      • 1970-01-01
      • 2016-01-09
      相关资源
      最近更新 更多