awk 删除 \n 如果下一行不匹配答案

【问题标题】：awk remove \n if next line doesn't matchawk 删除 \n 如果下一行不匹配
【发布时间】：2022-01-14 13:40:28
【问题描述】：

awk 'tolower($0) ~ /\.[log(message|event)|trace(error)?c?|infoc?|warnc?|debugc?|errorc?]/,/)/{gsub(/^\t+/, "", $0);print NR","$0}' example_file

我创建了这个脚本，它可以在文件中查找以下模式：

log.Info("hello world")
log.Error()

并输出如下内容：

4,log.Info("hello world")
7,log.Error()

行号和文本本身。

问题是，如果我的文件中有这样的内容：

log.Info("hello world")
log.Warn(
    "hello world")
log.Error()

它会输出如下内容：

4,log.Info("hello world")
5,log.Warn(
6,"hello world")
7,log.Error()

我想让"hello world") 与log.Warn( 相同。

所需的输出类似于：

4,log.Info("hello world")
5,log.Warn("hello world")
7,log.Error()

非常感谢。

【问题讨论】：

请在您的问题中添加示例输入（无描述、无图像、无链接）以及该示例输入所需的输出（无评论）。
@Cyrus 你好。我实际上是这样做的。 “问题是，如果我的文件中有类似的东西”有样本，所需的输出是问题的最后一部分。我没明白你的意思，抱歉。
/\.[log(me 这是一个非常奇怪的正则表达式，它不是这样工作的。 gsub(/^\t+/, "", $0) 不，你不能简单地做。问题在于您如何表述条件：if the **next line** found 如果您决定对下一行 执行操作，这意味着您必须缓冲数据。写一些依赖于当前行的东西，没有下一个，比如“如果当前行不以)结尾”。
@KamilCuk 我使用 regex101 制作的，像这样：regex101.com/r/YMvp01/1
嗯，是的，它是一个有效的正则表达式。不，括号表达式不会选择其中的表达式之一，| 是其中的普通字符。您的正则表达式与[()?abcdefgilmnorstuvw|] 相同。 [...] 不是 (..)

标签： bash shell awk

【解决方案1】：

如果找到的 下一行 不是以模式 /.[log(message|event)|trace(error)?c?|infoc?|warnc?|debugc?| 开头errorc?]/ 它将把这一行放在之前的那一行。

您不能根据下一行进行操作，您只能根据当前行进行操作。这基本上意味着您必须：

缓冲一行（上一行）
如果当前行确实以模式 /.[log(message|event)|trace(error)?c?|infoc?|warnc?|debugc?|errorc?]/ 开头，则输出上一行。上一行变成当前行。
否则，输出上一行和当前行。上一行变为空。
END { 输出上一行 }

一些东西：

awk '
    /^log\./{  # the pattern here
       if (last) {
         print NR - 1, last;  # output previous line
        }
       last=$0  # previous line becomes current line
       next
    }
    { # otherwise, because next above
       print NR - 1, last $0   # output previous line and current line
       last=""  # previous line becomes empty.
    }
    END{
      if (last) {
        print NR, last  # Handle previous line on the end.
      }
    }
'

改变你的条件，所以它只取决于“当前行”。比如，如果当前行不以) 结尾，则吃下一行。

awk '/[^)]$/{
   n=NR
   a=$0
   getline
   print n " " a $0
}'

【讨论】：

【解决方案2】：

这是一个尽力而为的脚本（即在各种下雨天的情况下会失败），使用这个输入文件：

$ cat file
foo
log.Info("hello
        world")
log.Warn(
    "hello
                some other
        world")
log.Error()
bar

和任何 POSIX awk：

$ cat tst.awk
BEGIN {
    begRe = "log[.](Info|Warn|Error)[(]"
    regexp = begRe "[^)]*[)]"
    OFS = ","
}
$0 ~ begRe {
    begNr = NR
    buf = ""
}
begNr {
    buf = buf $0
    if ( match(buf,regexp) ) {
        buf = substr(buf,RSTART,RLENGTH)
        gsub(/[[:space:]]*"[[:space:]]*/,"\"",buf)
        print begNr, buf
        begNr = 0
    }
}

$ awk -f tst.awk file
2,log.Info("hello       world")
4,log.Warn("hello               some other      world")
8,log.Error()

如果您想折叠引号内的所有空白并删除任何前导空白，则只需在 print 语句之前添加gsub(/[[:space:]]+/," ",buf); gsub(/^ | $/,"",buf)。

$ cat tst.awk
BEGIN {
    begRe = "log[.](Info|Warn|Error)[(]"
    regexp = begRe "[^)]*[)]"
    OFS = ","
}
$0 ~ begRe {
    begNr = NR
    buf = ""
}
begNr {
    buf = buf $0
    if ( match(buf,regexp) ) {
        buf = substr(buf,RSTART,RLENGTH)
        gsub(/[[:space:]]*"[[:space:]]*/,"\"",buf)
        gsub(/[[:space:]]+/," ",buf); gsub(/^ | $/,"",buf)
        print begNr, buf
        begNr = 0
    }
}

$ awk -f tst.awk file
2,log.Info("hello world")
4,log.Warn("hello some other world")
8,log.Error()

【讨论】：