【问题标题】:Parsing unstructured log file in python在python中解析非结构化日志文件
【发布时间】:2020-08-10 15:21:44
【问题描述】:

我想解析一个包含非结构化文本的日志文件。我需要获取核心 id,在 json 中通过/失败。自从一周以来,我对编程很陌生,如果有任何帮助,我将不胜感激。

AMPTTK v25: RSA ALL THREADS
================RSACores X RSACores==============
time: 421045.73
Num Threads Available to process: 256
Num Cores   Requested to execute: 256
TSC freq: 1600629120.0

Memory allocated @ main (not all used by program): 3842.000000 MB

  RSA thread:       : 0
wrkspace addr       : 7f0483400000
wrkspace size       : f00000

        # cores:   16
        core id:      0,      1,      2,      3,      4,      5,      6,      7,     64,     65,     66,     67,     68,     69,     70,     71,
      pass/fail:   pass,   pass,   pass,   pass,   pass,   pass,   pass,   pass,   pass,   pass,   pass,   pass,   pass,   pass,   pass,   pass,
       test ipc:  4.497,  4.503,  4.489,  4.476,  4.537,  4.471,  4.499,  4.459,  4.934,  4.946,  4.892,  4.933,  4.927,  4.927,  4.882,  4.886,
     aperf(MHz):   2826,   2814,   2826,   2826,   2826,   2826,   2827,   2826,   2909,   2909,   2909,   2909,   2909,   2909,   2909,   2909,
      aperf ipc:  2.392,  2.408,  2.397,  2.392,  2.397,  2.388,  2.397,  2.388,  2.341,  2.341,  2.340,  2.341,  2.341,  2.341,  2.340,  2.340,
     mce status:   pass,   pass,   pass,   pass,   pass,   pass,   pass,   pass,   pass,   pass,   pass,   pass,   pass,   pass,   pass,   pass,

【问题讨论】:

    标签: python python-3.x logging


    【解决方案1】:

    我这样做的一般方法是在日志文件中查找结构,然后将其拉出。查看您共享的数据,感兴趣的行上有一个: 字符和十六个逗号分隔值。由于数据不是直接放入 json 的形式,我将其存储在一个临时字典中,然后将其转换为 json 字符串。示例如下:

    import json
    
    # parse the log file and store in dictionary
    raw_data = {}
    with open('unstructured_data.txt') as log:
        for line in log:
            line = line.rstrip()
            if line.count(':') == 1:
                heading, data = line.split(':')
                fields = data.split(',')
                if len(fields) > 15:
                    raw_data[heading.lstrip()] = fields
    
    # Put only data of interest in to another python dictionary
    result_data = {}
    for i in range(len(raw_data['core id'])):
        result_data[raw_data['core id'][i].strip()] = raw_data['pass/fail'][i].strip()
    
    # Convert python dictionary to json string
    result_json = json.dumps(result_data)
    
    print(result_json)
    

    从您的日志文件中给出以下信息:

    $ python3 parse_log.py 
    {"0": "pass", "1": "pass", "2": "pass", "3": "pass", "4": "pass", "5": "pass", "6": "pass", "7": "pass", "64": "pass", "65": "pass", "66": "pass", "67": "pass", "68": "pass", "69": "pass", "70": "pass", "71": "pass", "": ""}
    

    虽然这不是一个完美的结果,但希望它可以通过实际数据进行改进。

    【讨论】:

      猜你喜欢
      • 2017-05-21
      • 1970-01-01
      • 1970-01-01
      • 2015-08-26
      • 1970-01-01
      • 2020-07-03
      • 2021-06-15
      • 1970-01-01
      相关资源
      最近更新 更多