【问题标题】:Parse from log file in python从python中的日志文件解析
【发布时间】:2017-07-12 17:57:06
【问题描述】:

我有一个包含任意行数和 json 字符串的日志文件。我只需要从日志文件中提取一个 json 数据,但仅在“_____GP D____”之后。我不想要文件中的任何其他行或 json 数据。

这就是我的输入文件的样子

INFO:modules.gp.helpers.parameter_getter:_____GP D_____
{'from_time': '2017-07-12 19:57', 'to_time': '2017-07-12 20:57', 'consig_number': 'dup1', 'text': 'r155', 'mobile': None, 'email': None}
ERROR:modules.common.actionexception:ActionError: [{'other': 'your request already crossed threshold time'}]
{'from_time': '2016-07-12 16:57', 'to_time': '2016-07-12 22:57', 'consig_number': 'dup2', 'text': 'r15', 'mobile': None, 'email': None}

如何仅在 '_____GP D____' 之后找到 json 字符串?

【问题讨论】:

    标签: json regex python-3.x parsing logging


    【解决方案1】:

    您可以逐行读取文件,直到在行尾遇到_____GP D_____,当您确实拿起下一行时:

    found_json = None
    with open("input.log", "r") as f:  # open your log file
        for line in f:  # read it line by line
            if line.rstrip()[-14:] == "_____GP D_____":  # if a line ends with our string...
                found_json = next(f).rstrip()  # grab the next line
                break  # stop reading of the file, nothing more of interest
    

    然后你可以用你的found_json做任何你想做的事情,包括解析它、打印它等等。

    更新 - 如果您想持续“关注”您的日志文件(类似于tail -f 命令),您可以在读取模式下打开它,并在读取时保持文件句柄打开在读取之间添加合理延迟的行(这在很大程度上也是tail -f 的做法) - 然后您可以使用相同的过程来发现您想要的行何时出现并捕获下一行进行处理,发送到其他进程或执行无论你打算用它做什么。比如:

    import time
    
    capture = False  # a flag to use to signal the capture of the next line
    found_lines = []  # a list to store our found lines, just as an example
    with open("input.log", "r") as f:  # open the file for reading...
        while True:  # loop indefinitely
            line = f.readline()  # grab a line from the file
            if line != '':  # if there is some content on the current line...
                if capture:  # capture the current line
                    found_lines.append(line.rstrip())  # store the found line
                    # instead, you can do whatever you want with the captured line
                    # i.e. to print it: print("Found: {}".format(line.rstrip()))
                    capture = False  # reset the capture flag
                elif line.rstrip()[-14:] == "_____GP D_____":  # if it ends in '_____GP D_____'..
                    capture = True  # signal that the next line should be captured
            else:  # an empty buffer encountered, most probably EOF...
                time.sleep(1)  # ... let's wait for a second before attempting to read again...
    

    【讨论】:

    • 任何想法,我怎样才能连续读取日志,并且任何时候出现这个'_____GP D_____',我可能将它保存在rabbitMQ中然后处理它?或者如果我有多个需要选择的 json 字符串在日志文件中是唯一的?
    • 你能帮忙吗?
    【解决方案2】:

    导入 json from ast import literal_eval

    KEY_STRING = '''_____GP D_____'''
    
    text = """INFO:modules.gp.helpers.parameter_getter:_____GP D_____
    {'from_time': '2017-07-12 19:57', 'to_time': '2017-07-12 20:57', 'consig_number': 'dup1', 'text': 'r155', 'mobile': None, 'email': None}
    ERROR:modules.common.actionexception:ActionError: [{'other': 'your request already crossed threshold time'}]
    {'from_time': '2016-07-12 16:57', 'to_time': '2016-07-12 22:57', 'consig_number': 'dup2', 'text': 'r15', 'mobile': None, 'email': None}"""
    
    
    
    lines = text.split("\n") # load log text into a list. 
    # for loading from log would be more  like
    # with open("/var/log/syslog.log", 'r') as f:
    #     lines = f.readlines()
    
    # set "gate" flag to False
    flag = False
    for loop in lines:
            line = loop.strip()
            if flag:  # "gate" opened
                    # depends how's the dictionary streamed to log
                    # you could use json.loads(line), but if it is not sent to log with json.dumps than you have pythonic dictinary and use 
                    # literal_eval to load that dictionary to a variable
                    # .. a 
                    target_json = literal_eval(line)
                    print json.dumps(target_json, indent=4)
            if KEY_STRING in line:
                    flag = True   # KEY_STRING found open "gate"
            else:
                    flag = False  # close "gate"
    ~                               
    

    输出:

    {
         "consig_number": "dup1", 
         "text": "r155", 
         "email": null, 
         "mobile": null, 
         "to_time": "2017-07-12 20:57", 
         "from_time": "2017-07-12 19:57" 
    

    }

    【讨论】:

    • 请解释您的代码,而不是仅仅发布代码。您的答案与其他答案有何不同?
    • 您可以复制尝试并更改。无论如何,我会添加一些 cmets。
    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2016-01-09
    相关资源
    最近更新 更多