【问题标题】:Split ~200mb log4j log file by day按天拆分 ~200mb log4j 日志文件
【发布时间】:2018-01-03 19:58:27
【问题描述】:

我有一个格式如下的日志文件,我想按天将其拆分为多个文件(即 log-2017-10-2、log-2017-10-3 等)。我见过人们用 awk 来做,但我不确定如何处理堆栈跟踪,因为 java.io.Exception 是一个新行。有没有方便的方法来实现这一点?

    2017-10-02 04:26:02,534 INFO XXXXXXXXXXXXXXXXX
    2017-10-03 04:26:02,543 INFO XXXXXXXXXXXX
    2017-10-04 04:26:02,544 INFO XXXXXXXXX
    2017-10-04 04:26:02,546 INFO XXXXXXXXXXXXX
    2017-10-04 04:26:02,549 INFO XXXXXXXXXXX
    2017-10-04 04:53:02,787 WARN class.class.class: [FetcherXXXXXX], Error in fetch XXXXXXXXXXXXXXXXXXXXXX
    java.io.IOException: Connection to X was disconnected before the response was read
            at XXXXXXXXXXXXXXXXXXXX
            at XXXXXXXXXXXXXXXXXXXX
            at XXXXXXXXXXXXXXXXXXXXX
            at XXXXXXXXXXXXXXXX
            at XXXXXXXXXXXXXXXX
    2017-10-05 04:26:02,549 INFO XXXXXXXXXXX

最终文件内容为:

log-2017-10-2:
2017-10-02 04:26:02,534 INFO XXXXXXXXXXXXXXXXX


log-2017-10-3:
2017-10-03 04:26:02,543 INFO XXXXXXXXXXXX

log-2017-10-4:
2017-10-04 04:26:02,544 INFO XXXXXXXXX
    2017-10-04 04:26:02,546 INFO XXXXXXXXXXXXX
    2017-10-04 04:26:02,549 INFO XXXXXXXXXXX
    2017-10-04 04:53:02,787 WARN class.class.class: [FetcherXXXXXX], Error in fetch XXXXXXXXXXXXXXXXXXXXXX
    java.io.IOException: Connection to X was disconnected before the response was read
            at XXXXXXXXXXXXXXXXXXXX
            at XXXXXXXXXXXXXXXXXXXX
            at XXXXXXXXXXXXXXXXXXXXX
            at XXXXXXXXXXXXXXXX
            at XXXXXXXXXXXXXXXX

log-2017-10-5:
2017-10-05 04:26:02,549 INFO XXXXXXXXXXX

【问题讨论】:

  • 好的,发布最终文件内容
  • 你试过logrotate吗?
  • @RomanPerekhrest 发布最终文件内容以供参考

标签: bash text awk logfile


【解决方案1】:

awk 来救援!

$ awk --posix 'BEGIN{f="log-header"} 
     $1~/^[0-9]{4}-[0-9]{2}-[0-9]{2}$/{f="log-"$1} {print > f}' log

如果日期过多(对应于打开的文件过多),您可能需要一次性关闭文件。几百个它应该可以按原样工作。

设置初始日志文件(日志头)以防您的日志不是以检查的正则表达式开头。

【讨论】:

  • --posix 仅适用于 gawk,仅适用于非常旧的 gawk 版本。通过使用它,您可以禁用所有其他非常有用的 gawk 扩展(例如 gensub()),因此如果您只是想在非常旧的 gawk 版本中打开 RE 间隔,那么您应该使用 --re-interval 而不是 --posix .此外,由于该标志是特定于 gawk 的,因此您在使用它时不会收到太多打开文件错误。
【解决方案2】:

awk解决方案:

awk '/^[0-9]{4}-[0-9]{2}-[0-9]{2} /{ 
         if (fn && !a[$1]++) close(fn);
         fn="log-"$1 
     }{ print > fn }' logfile
  • /^[0-9]{4}-[0-9]{2}-[0-9]{2} / - 遇到以日期字符串开头的行
  • if(fn && !a[$1]++) close(fn) - 关闭前一个“日期”打开的文件描述符
  • fn="log-"$1 - 构造文件名

查看结果:

$ head log-*
==> log-2017-10-02 <==
2017-10-02 04:26:02,534 INFO XXXXXXXXXXXXXXXXX

==> log-2017-10-03 <==
2017-10-03 04:26:02,543 INFO XXXXXXXXXXXX

==> log-2017-10-04 <==
2017-10-04 04:26:02,544 INFO XXXXXXXXX
2017-10-04 04:26:02,546 INFO XXXXXXXXXXXXX
2017-10-04 04:26:02,549 INFO XXXXXXXXXXX
2017-10-04 04:53:02,787 WARN class.class.class: [FetcherXXXXXX], Error in fetch XXXXXXXXXXXXXXXXXXXXXX
java.io.IOException: Connection to X was disconnected before the response was read
        &XXXXXXXXXXXXXXXXXXXX
        &XXXXXXXXXXXXXXXXXXXX
        &XXXXXXXXXXXXXXXXXXXXX
        &XXXXXXXXXXXXXXXX
        &XXXXXXXXXXXXXXXX

==> log-2017-10-05 <==
2017-10-05 04:26:02,549 INFO XXXXXXXXXXX

【讨论】:

  • @EdMorton,感谢您的提示。这不是全部逻辑,现在更新了
猜你喜欢
  • 2017-01-14
  • 1970-01-01
  • 1970-01-01
  • 2015-08-08
  • 1970-01-01
  • 2010-09-10
  • 1970-01-01
  • 1970-01-01
  • 2014-02-17
相关资源
最近更新 更多