【问题标题】:Python: Need to get unique errors from log filePython:需要从日志文件中获取唯一错误
【发布时间】:2013-07-16 23:18:15
【问题描述】:

我目前拥有的东西

def unique_ips():
f = open('logfile','r')
ips = set()
for line in f:
    ip = line.split()[0]
    print ip
    for date in ip:
       logdate = line.split()[3]
       print "\t", logdate
       for entry in logdate:
           info = line.split()[5:11] 
           print "\t\t", info
    ips.add(ip)
unique_ips()

我遇到的问题是:

       for entry in logdate:
           info = line.split()[5:20] 
           print "\t\t", info

我有一个日志文件,我必须先按 ip 排序,然后按时间排序,然后按错误排序

应该看起来像:

199.21.99.83
        [30/Jun/2013:07:18:30
                ['"GET', '/searchme/index.php?f=man_soweth', 'HTTP/1.1"', '200', '8676', '"-"']

但我得到的是:

199.21.99.83
        [30/Jun/2013:07:18:30
                ['"GET', '/searchme/index.php?f=man_soweth', 'HTTP/1.1"', '200', '8676', '"-"']
                ['"GET', '/searchme/index.php?f=man_soweth', 'HTTP/1.1"', '200', '8676', '"-"']
                ['"GET', '/searchme/index.php?f=man_soweth', 'HTTP/1.1"', '200', '8676', '"-"']
                ['"GET', '/searchme/index.php?f=man_soweth', 'HTTP/1.1"', '200', '8676', '"-"']
                 ...

我确定我遇到了某种语法问题,但希望能得到帮助!

日志文件如下:

99.21.99.83 - - [30/Jun/2013:07:15:50 -0500] "GET /lenny/index.php?f=13 HTTP/1.1" 200 11244 "-" "Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)"
199.21.99.83 - - [30/Jun/2013:07:16:13 -0500] "GET /searchme/index.php?f=being_fruitful HTTP/1.1" 200 7526 "-" "Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)"
199.21.99.83 - - [30/Jun/2013:07:16:45 -0500] "GET /searchme/index.php?f=comparing_themselves HTTP/1.1" 200 7369 "-" "Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)"
66.249.73.40 - - [30/Jun/2013:07:16:56 -0500] "GET /espanol/displayAncient.cgi?ref=isa%2054:3 HTTP/1.1" 500 167 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
199.21.99.83 - - [30/Jun/2013:07:17:00 -0500] "GET /searchme/index.php?f=tribulation HTTP/1.1" 200 7060 "-" "Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)"
199.21.99.83 - - [30/Jun/2013:07:17:15 -0500] "GET /searchme/index.php?f=proud HTTP/1.1" 200 7080 "-" "Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)"
199.21.99.83 - - [30/Jun/2013:07:17:34 -0500] "GET /searchme/index.php?f=soul HTTP/1.1" 200 7063 "-" "Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)"
199.21.99.83 - - [30/Jun/2013:07:17:38 -0500] "GET /searchme/index.php?f=the_flesh_lusteth HTTP/1.1" 200 6951 "-" "Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.c

【问题讨论】:

  • 输入文件是什么样的?另外,别忘了关闭 f!
  • logdate 似乎是一个字符串,因此通过迭代它,您可以迭代每个单独的字符。您的循环只为logdate 中的每个字符打印一次"\t\t", info
  • 使用日志文件片段编辑问题
  • 你的循环做同样的事情,每次打印相同的值。如果您只希望它打印一次,为什么它完全处于循环中?如果你想要一个循环,那么每次都应该有什么不同?

标签: python file split log4j unique


【解决方案1】:

由于示例输出,这个问题有点令人困惑,但我很确定你想要这样的东西:

def unique_ips():
    f = open('logfile','r')
    ips = {}
    # This for loop collects all of the ips with their associated errors
    for line in f:
        ip = line.split()[0]
        try:
            ips[ip].append(line)
        except KeyError:
            ips[ip] = [line]

    # This for loop goes through all the ips that were collected
    # and prints out all errors for those ips
    for ip, errors in ips.iteritems():
        print ip
        errors.sort()
        for e in errors:
           logdate = e.split()[3]
           print "\t", logdate

           info = e.split()[5:11] 
           print "\t\t", info

    f.close()

从您的示例文件中生成此输出:

199.21.99.83
    [30/Jun/2013:07:16:13
        ['"GET', '/searchme/index.php?f=being_fruitful', 'HTTP/1.1"', '200', '7526', '"-"']
    [30/Jun/2013:07:16:45
        ['"GET', '/searchme/index.php?f=comparing_themselves', 'HTTP/1.1"', '200', '7369', '"-"']
    [30/Jun/2013:07:17:00
        ['"GET', '/searchme/index.php?f=tribulation', 'HTTP/1.1"', '200', '7060', '"-"']
    [30/Jun/2013:07:17:15
        ['"GET', '/searchme/index.php?f=proud', 'HTTP/1.1"', '200', '7080', '"-"']
    [30/Jun/2013:07:17:34
        ['"GET', '/searchme/index.php?f=soul', 'HTTP/1.1"', '200', '7063', '"-"']
    [30/Jun/2013:07:17:38
        ['"GET', '/searchme/index.php?f=the_flesh_lusteth', 'HTTP/1.1"', '200', '6951', '"-"']
66.249.73.40
    [30/Jun/2013:07:16:56
        ['"GET', '/espanol/displayAncient.cgi?ref=isa%2054:3', 'HTTP/1.1"', '500', '167', '"-"']
99.21.99.83
    [30/Jun/2013:07:15:50
        ['"GET', '/lenny/index.php?f=13', 'HTTP/1.1"', '200', '11244', '"-"']

【讨论】:

    【解决方案2】:

    你的循环太多了。您不需要 for entry in logdate 循环。您已经在遍历每一行。

    删除 for entry in logdate 并突出信息分配和打印语句。

    (cmets已经提到了。)

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 2014-08-19
      • 2022-08-02
      • 2012-09-23
      • 1970-01-01
      • 2016-07-05
      • 2019-06-08
      • 1970-01-01
      • 2012-09-14
      相关资源
      最近更新 更多