在python中跟踪多个日志文件答案

【问题标题】：tail multiple logfiles in python在python中跟踪多个日志文件
【发布时间】：2011-04-20 02:59:28
【问题描述】：

这对我来说可能有点愚蠢，但它提出了许多有趣的问题。我有一个来自聊天客户端的日志文件目录，并且每次其中一个发生更改时，我都希望使用 notify-osd 得到通知。

我编写的脚本基本上使用 os.popen 在每个文件上运行 linux tail 命令以获取最后一行，然后根据字典检查每行最后一次运行的行。如果该行发生更改，它会使用 pynotify 向我发送通知。

这个脚本实际上运行得很好，除了它使用了大量的 cpu （可能是因为每次循环运行时，它在通过 sshfs 挂载的文件上运行大约 16 次。）

看起来像this 这样的东西会是一个很好的解决方案，但我不知道如何为多个文件实现它。

Here 是我写的脚本。请原谅我缺乏 cmets 和糟糕的风格。

编辑：澄清一下，这都是桌面上的 linux。

【问题讨论】：

标签： python

【解决方案1】：

甚至不用查看您的源代码，有两种方法可以让您更轻松地更高效地处理多个文件。

除非万不得已，否则不要打扰。只需os.stat所有文件并记录最后修改时间。如果上次修改时间不同，则发出通知。
使用pyinotify 调用Linux 的inotify 设施；这将使内核为您执行选项 1，并在您的目录中的任何文件更改时回调您。然后将回调转换为您的 osd 通知。

现在，可能会有一些棘手的问题，具体取决于您在有多条消息时需要多少通知，以及您是否关心错过一条消息的通知。

保留使用tail 的方法是改用tail -f。使用tail -f 打开所有文件，然后使用select 模块让操作系统告诉您何时在为tail -f 打开的文件描述符之一上有其他输入。您的主循环将调用 select 然后遍历每个可读描述符以生成通知。（您可以不使用 tail 并在可读时调用 readline() 来执行此操作。）

脚本的其他改进领域：

使用 os.listdir 和原生 Python 过滤器（例如，使用列表推导式）而不是带有一堆 grep 过滤器的 popen。
更新缓冲区列表以定期扫描，而不是仅在程序启动时进行。
使用subprocess.popen 而不是os.popen。

【讨论】：

非常感谢。我想我正确地实现了第一个选项——它可以工作并且使用更少的 cpu。（这是我唯一真正理解的。）My Improved Code
您基本上是在尝试找到最有效的机制来识别文件何时发生更改并对其进行处理。（例如，请参阅我在 How tail -f is implemented 上的 Quora 回答。）调用 os.stat 比分叉进程便宜。使用 inotify 会更好。

【解决方案2】：

如果您已经在使用 pyinotify 模块，在纯 Python 中很容易做到这一点（即无需生成单独的进程来跟踪每个文件）。

这是一个由 inotify 事件驱动的示例，应该使用很少的 cpu。当给定路径发生 IN_MODIFY 时，我们从文件句柄中读取所有可用数据并输出找到的任何完整行，缓冲不完整的行直到有更多数据可用：

import os
import select
import sys
import pynotify
import pyinotify

class Watcher(pyinotify.ProcessEvent):

    def __init__(self, paths):
        self._manager = pyinotify.WatchManager()
        self._notify = pyinotify.Notifier(self._manager, self)
        self._paths = {}
        for path in paths:
            self._manager.add_watch(path, pyinotify.IN_MODIFY)
            fh = open(path, 'rb')
            fh.seek(0, os.SEEK_END)
            self._paths[os.path.realpath(path)] = [fh, '']

    def run(self):
        while True:
            self._notify.process_events()
            if self._notify.check_events():
                self._notify.read_events()

    def process_default(self, evt):
        path = evt.pathname
        fh, buf = self._paths[path]
        data = fh.read()
        lines = data.split('\n')
        # output previous incomplete line.
        if buf:
            lines[0] = buf + lines[0]
        # only output the last line if it was complete.
        if lines[-1]:
            buf = lines[-1]
        lines.pop()

        # display a notification
        notice = pynotify.Notification('%s changed' % path, '\n'.join(lines))
        notice.show()

        # and output to stdout
        for line in lines:
            sys.stdout.write(path + ': ' + line + '\n')
        sys.stdout.flush()
        self._paths[path][1] = buf

pynotify.init('watcher')
paths = sys.argv[1:]
Watcher(paths).run()

用法：

% python watcher.py [path1 path2 ... pathN]

【讨论】：

他使用的是 pynotify，而不是 pyinotify。但是不错！我认为你可以将最后一个不完整的行弹出到buf，然后放弃使用limit。
这看起来是一个更好的解决方案，但恐怕我有点过头了。不过还是非常感谢！
更新它以通过 pynotify 显示通知。
虽然我喜欢这个解决方案，但它缺乏正确处理 logrotate。考虑在这些方面添加一些东西。 fail2ban 有一个很好的例子：github.com/fail2ban/fail2ban/blob/master/server/…

【解决方案3】：

简单的纯python解决方案（不是最好的，但不会分叉，空闲期后吐出4个空行，并且每次都标记块的来源，如果有变化）：

#!/usr/bin/env python

from __future__ import with_statement

'''
Implement multi-file tail
'''

import os
import sys
import time


def print_file_from(filename, pos):
    with open(filename, 'rb') as fh:
        fh.seek(pos)
        while True:
            chunk = fh.read(8192)
            if not chunk:
                break
            sys.stdout.write(chunk)


def _fstat(filename):
    st_results = os.stat(filename)
    return (st_results[6], st_results[8])


def _print_if_needed(filename, last_stats, no_fn, last_fn):
    changed = False
    #Find the size of the file and move to  the end
    tup = _fstat(filename)
    # print tup
    if last_stats[filename] != tup:
        changed = True
        if not no_fn and last_fn != filename:
            print '\n<%s>' % filename
        print_file_from(filename, last_stats[filename][0])
        last_stats[filename] = tup
    return changed


def multi_tail(filenames, stdout=sys.stdout, interval=1, idle=10, no_fn=False):
    S = lambda (st_size, st_mtime): (max(0, st_size - 124), st_mtime)
    last_stats = dict((fn, S(_fstat(fn))) for fn in filenames)
    last_fn = None
    last_print = 0
    while 1:
        # print last_stats
        changed = False
        for filename in filenames:
            if _print_if_needed(filename, last_stats, no_fn, last_fn):
                changed = True
                last_fn = filename
        if changed:
            if idle > 0:
                last_print = time.time()
        else:
            if idle > 0 and last_print is not None:
                if time.time() - last_print >= idle:
                    last_print = None
                    print '\n' * 4
            time.sleep(interval)

if '__main__' == __name__:
    from optparse import OptionParser
    op = OptionParser()
    op.add_option('-F', '--no-fn', help="don't print filename when changes",
        default=False, action='store_true')
    op.add_option('-i', '--idle', help='idle time, in seconds (0 turns off)',
        type='int', default=10)
    op.add_option('--interval', help='check interval, in seconds', type='int',
        default=1)
    opts, args = op.parse_args()
    try:
        multi_tail(args, interval=opts.interval, idle=opts.idle,
            no_fn=opts.no_fn)
    except KeyboardInterrupt:
        pass

【讨论】：