【问题标题】:bash find string in huge log filebash 在巨大的日志文件中查找字符串
【发布时间】:2014-11-20 11:11:35
【问题描述】:

我有一个巨大的日志文件,其中包含超过 100M 的字符串。 它包含 19 列:

time | date | host | user | domain | category   | source | port | URL | etc

示例:

time    date    host    user    domain  category    source  port    URL etc
2:10:21 18.11.2014  192.168.56.101  %username1% %domainname%    "many words"    stackoverflow.com   "80"    http://stackoverflow.com/   
2:10:22 18.11.2014  192.168.56.101  %username2% %domainname%    "done"  stackoverflow.com   "80"    http://stackoverflow.com/   
2:10:23 18.11.2014  192.168.56.101  %username3% %domainname%    "denied site"   stackoverflow.com   "80"    http://stackoverflow.com/   
2:10:24 18.11.2014  192.168.56.101  %username4% %domainname%    "suspicious"    stackoverflow.com   "80"    http://stackoverflow.com/   
2:10:25 18.11.2014  192.168.56.101  %username5% %domainname%    "uncategorized" stackoverflow.com   "80"    http://stackoverflow.com/   
2:10:26 18.11.2014  192.168.56.101  %username6% %domainname%    "denied site"   stackoverflow.com   "80"    http://stackoverflow.com/   
2:10:27 18.11.2014  192.168.56.101  %username7% %domainname%    "many words"    stackoverflow.com   "80"    http://stackoverflow.com/

当我尝试在列中查找字符串时,有时它看起来很糟糕:

user@stand-01:~/folder$cat file |awk '{FS=" ";print$6}'
category
"many
"done"
"denied
"suspicious"
"uncategorized"
"denied
"many

所以当我尝试第 7 列时,它有来自另一列的数据:

user@stand-01:~/folder$cat file |awk '{FS=" ";print$7}'
source
words"
stackoverflow.com
site"
stackoverflow.com
stackoverflow.com
site"
words"

如何使用空格分隔符并避免用引号分隔文本?

【问题讨论】:

  • 与其为此寻找复杂的正则表达式,不如更改此文件的写入方式,使其以逗号分隔(csv)、制表符分隔等。也就是说,不存在于字段中。否则很可能在未来给你带来更多问题。
  • 你是说这个awk -v FS="\"" '{print $2}' file 吗?
  • 您的文件制表符是用制表符分隔的,而不是空格分隔的。使用head -1 logFile | cat -vte 命令检查。

标签: bash logging awk


【解决方案1】:

这是一个awk

awk -F\" 'NR>1{print $2}' file
many words
done
denied site
suspicious
uncategorized
denied site
many words

或者

awk -F\" 'NR>1{print FS$2FS}' file
"many words"
"done"
"denied site"
"suspicious"
"uncategorized"
"denied site"
"many words"

【讨论】:

  • 虽然这确实做到了,但请注意 OP 说“我如何使用空格分隔符而不用引号分隔文本?”
【解决方案2】:

这样的事情可能会奏效

$ awk '$6 ~ /^"[^"]+"$/{print $6;next} $6 ~ /^"/{print $6, $7}' input
"many words"
"done"
"denied site"
"suspicious"
"uncategorized"
"denied site"
"many words"

【讨论】:

  • @fedorqui Uff,我做到了。需要一杯咖啡,谢谢。
猜你喜欢
  • 1970-01-01
  • 2013-02-11
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 2011-05-01
  • 1970-01-01
  • 1970-01-01
相关资源
最近更新 更多