删除最后匹配模式之间的行答案

【问题标题】：Delete lines between last matching patterns删除最后匹配模式之间的行
【发布时间】：2018-04-19 21:54:15
【问题描述】：

首先，我知道these nice的问题。我的问题有点不同：鉴于下面的文本格式来自file1：

Pattern 1
some text to keep
nice text here
Pattern 1
another text to keep
Pattern 1
REMOVE ME
AND ME
ME TOO PLEASE
Pattern 2

如何仅删除最后一个 Pattern 1 和 Pattern 2 之间的文本（包括模式），以便 file1 现在包含：

Pattern 1
some text to keep
nice text here
Pattern 1
another text to keep

我更喜欢使用 sed 的解决方案，但任何其他解决方案（perl、bash、awk）都可以。

【问题讨论】：

标签： bash perl awk sed

【解决方案1】：

perl -ne 'if    (/Pattern 1/) { print splice @buff; push @buff, $_ }
          elsif (/Pattern 2/) { @buff = () }
          elsif (@buff)       { push @buff, $_ }
          else                { print }
' -- file

当您看到Pattern 1 时，开始将行推入@buffer，输出到目前为止累积的所有行。当您看到Pattern 2 时，清除缓冲区。如果缓冲区已启动，则将任何其他行推送到它，否则打印它（第一个 Pattern 1 之前或 Pattern 2 之后的文本。

注意：没有指定Pattern 2 之前没有Pattern 1 的行为。

【讨论】：

【解决方案2】：

我想不出单独在 sed 中简单而优雅地做到这一点的方法。使用write-only code 可以通过sed 执行此操作，但我需要一个非常好的理由来编写类似的内容。 :-)

您仍然可以将sed 与其他工具结合使用：

$ tac test.txt | sed '/^Pattern 2$/,/^Pattern 1$/d' | tac
Pattern 1
some text to keep
nice text here
Pattern 1
another text to keep

如果您的系统上没有tac，您可以创建一个：

$ alias tac="awk '{L[i++]=\$0} END {for(j=i-1;j>=0;)print L[j--]}'"

或与主题保持一致：

$ alias tac='sed '\''1!G;h;$!d'\'

也就是说，我会在 awk 中执行此操作，如下所示：

$ awk '/Pattern 1/{printf "%s",b;b=""} {b=b $0 ORS} /Pattern 2/{b=""} END{printf "%s",b}' text.txt
Pattern 1
some text to keep
nice text here
Pattern 1
another text to keep

或分开以便于阅读/评论：

awk '
  /Pattern 1/ {          # If we find the start pattern,
    printf "%s",b        # print the buffer (or nothing if it's empty)
    b=""                 # and empty the buffer.
  }
  {                      # Add the current line to a buffer, with the
    b=b $0 ORS           # correct output record separator.
  }
  /Pattern 2/ {          # If we find our close pattern,
    b=""                 # just empty the buffer.
  }
  END {                  # And at the end of the file,
    printf "%s",b        # print the buffer if we have one.
  }' test.txt

这与hek2mgl的解决方案大致相同，但排序更合理并使用ORS。 :-)

请注意，只有当Pattern 2 在文件中仅存在一次时，这两种解决方案才能正确运行。如果您有多个块，即同时包含开始和结束模式，则需要为此加倍努力。如果是这种情况，请在您的问题中提供更多详细信息。

【讨论】：

如果存在多个范围，sed 不会失败吗？
@revo，我不知道。如果这对您不起作用，我很想知道它以及您正在使用的 sed 版本（或什么操作系统等）。
复制输入文件内容并自己尝试。结尾有一个空行。
@ghoti 知道如何在文件中做到这一点吗？
@revo，我没有看到你看到的内容。我在 FreeBSD 上使用 sed，但 GNU sed 的行为方式相同。

【解决方案3】：

这可能对你有用（GNU sed）：

sed '/Pattern 1/,${//{x;//p;x;h};//!H;$!d;x;s/.*Pattern 2[^\n]*\n\?//;/^$/d}' file

这里的总体思路是收集以Pattern 1 开头的行，然后在遇到以Pattern 1 开头的另一行时刷新这些行，或者在文件末尾删除Pattern 1 和@987654325 之间的行@ 并打印剩余的内容。

关注包含Pattern 1 的第一行和文件结尾之间的行，正常打印所有其他行。如果一行包含Pattern 1，则交换到保留空间，如果这些行也包含相同的正则表达式，则打印这些行，然后替换保留空间中的当前行。如果当前行不包含正则表达式，则将其附加到保留空间，如果它不是文件结尾，则将其删除。在文件末尾，交换到保留空间并删除所有行，包括包含Pattern 2 的行并打印剩余的行。

注意当包含 Pattern 2 的行是文件的最后一行时，就会出现一个棘手的情况，例如您的示例。由于 sed 使用换行符来分隔行，它会在将行放入模式空间之前删除它们，并在打印之前附加它们。如果模式/保持空间为空，sed 将附加一个换行符，在这种情况下会添加一个虚假的换行符。解决方案是删除Pattern 1 和Pattern 2 之间的所有行，包括包含Pattern 2 的行之后的任何换行符。如果有额外的行，这些将正常打印，但是如果后面没有行，则保留空间现在将是空的，因为它之前一定包含过一些东西，因为它现在是空的，因此可以安全地删除。

【讨论】：

【解决方案4】：

使用 awk：

awk '
# On pattern 1 and when the buffer is not empty, flush the buffer
/Pattern 1/ && b!="" { printf "%s", b; b="" }

# Append the current line and a newline to the buffer
{ b=b""$0"\n" }

# Clean the buffer on pattern 2
/Pattern 2/ { b="" }' file

【讨论】：