【问题标题】:Python Count per hour log filePython 每小时计数日志文件
【发布时间】:2016-06-04 09:16:41
【问题描述】:

我有一个 python 脚本,它在日志文件中显示攻击的日期、时间和 IP 地址。我的问题是我需要能够计算每天每小时发生多少次攻击,但是当我实施计数时,它只计算总数而不是我想要的。

日志文件如下所示:

Feb  3 08:50:39 j4-be02 sshd[620]: Failed password for bin from 211.167.103.172 port 39701 ssh2
Feb  3 08:50:45 j4-be02 sshd[622]: Failed password for invalid user virus from 211.167.103.172 port 41354 ssh2
Feb  3 08:50:49 j4-be02 sshd[624]: Failed password for invalid user virus from 211.167.103.172 port 42994 ssh2
Feb  3 13:34:00 j4-be02 sshd[666]: Failed password for root from 85.17.188.70 port 45481 ssh2
Feb  3 13:34:01 j4-be02 sshd[670]: Failed password for root from 85.17.188.70 port 46802 ssh2
Feb  3 13:34:03 j4-be02 sshd[672]: Failed password for root from 85.17.188.70 port 47613 ssh2
Feb  3 13:34:05 j4-be02 sshd[676]: Failed password for root from 85.17.188.70 port 48495 ssh2
Feb  3 21:45:18 j4-be02 sshd[746]: Failed password for invalid user test from 62.45.87.113 port 50636 ssh2
Feb  4 08:39:46 j4-be02 sshd[1078]: Failed password for root from 1.234.51.243 port 60740 ssh2
Feb  4 08:39:55 j4-be02 sshd[1082]: Failed password for root from 1.234.51.243 port 34124 ssh2

我目前的代码是:

import re

myAuthlog=open('auth.log', 'r') #open the auth.log for reading
for line in myAuthlog: #go through each line of the file and return it to the variable line
ip_addresses = re.findall(r'([A-Z][a-z]{2}\s\s\d\s\d\d).+Failed password for .+? from (\S+)', line)

print ip_addresses

结果如图所示

[('Feb  5 08', '5.199.133.223')]
[]
[('Feb  5 08', '5.199.133.223')]
[]
[('Feb  5 08', '5.199.133.223')]
[]
[('Feb  5 08', '5.199.133.223')]
[]
[('Feb  5 08', '5.199.133.223')]

【问题讨论】:

  • 为什么不把小时也拉出来,然后你可以将日期和小时存储为键,并将所有 IP 地址存储为值。然后用Counter()统计ip地址个数
  • Python Script to view attacks per hour 的可能重复项。您应该编辑之前的问题以反映更改,而不是创建新问题。
  • 我拿走了计数,因为我无法让它工作

标签: python python-2.7 python-3.x logging


【解决方案1】:

python 函数groupby() 将根据您指定的任何标准对您的项目进行分组。

此代码将打印每天每小时的攻击次数:

from itertools import groupby

with open('auth.log') as myAuthlog:
    for key, group in groupby(myAuthlog, key = lambda x: x[:9]):
        print "%d attacks in hour %s"%(len(list(group)), key)

或者,来自 cmets 的额外要求:

from itertools import groupby

with open('auth.log') as myAuthlog:
    myAuthlog = (line for line in myAuthlog if "Failed password for" in line)
    for key, group in groupby(myAuthlog, key = lambda x: x[:9]):
        print "%d attacks in hour %s"%(len(list(group)), key)

或者,使用不同的格式:

from itertools import groupby

with open('auth.log') as myAuthlog:
    myAuthlog = (line for line in myAuthlog if "Failed password for" in line)
    for key, group in groupby(myAuthlog, key = lambda x: x[:9]):
        month, day, hour = key[0:3], key[4:6], key[7:9]
        print "%s:00 %s-%s: %d"%(hour, day, month, len(list(group)))

【讨论】:

  • 谢谢这个工作,但唯一的问题是它会遍历日志文件中的每个项目,我只需要显示状态为“密码失败”的攻击有什么方法可以实现吗?
  • 非常感谢,有什么方法可以编辑打印位,因为它在 08 年 2 月 3 日一小时内打印出 172 次攻击时看起来有点奇怪。
  • 当然,有很多方法可以格式化结果。我没有包括在内,因为这与您提出的问题无关。无论如何,请参阅我最近的编辑。
  • 或者只显示日期、小时、IP 地址以及发生了多少次攻击
【解决方案2】:
import collections
from datetime import datetime as dt

answer = collections.defaultdict(int)
with open('path/to/logfile') as infile:
    for line in infile:
        stamp = line[:9]
        t = dt.strptime(stamp, "%b\t%d\t%H")
        answer[t] += 1

【讨论】:

  • 仅供参考,这出现在低质量帖子队列中......可能是因为它只是代码
猜你喜欢
  • 2013-12-08
  • 2018-01-30
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 2022-01-25
  • 1970-01-01
  • 2016-12-11
相关资源
最近更新 更多