如果字符串包含关键字，则删除 \n 换行符答案

【问题标题】：Remove \n newline if string contains keyword如果字符串包含关键字，则删除 \n 换行符
【发布时间】：2014-02-18 15:53:42
【问题描述】：

我想知道我是否可以删除\n（换行符），前提是当前行有一个或多个来自列表的关键字；例如，如果 \n 包含单词 hello 或 world，我想删除它。

示例：

this is an original
file with lines
containing words like hello
and world
this is the end of the file

结果是：

this is an original
file with lines
containing words like hello and world this is the end of the file

我想使用 sed 或 awk，如果需要，还可以使用 grep、wc 或任何可用于此目的的命令。我希望能够对很多文件执行此操作。

【问题讨论】：

您希望所需的输出替换文件的内容吗？
是的，我想进行所有更改，以便只有那些包含关键字的行删除它们的 \n 并保存文件
如果最后一行包含关键字，是否应该包含换行符？
应该只匹配整个单词吗？例如，worlds 是否应该被视为关键字 word 的匹配项？
@potong 在我的情况下没关系，无论哪个更容易，如果添加换行符太麻烦，那么没关系

标签： regex linux sed awk newline

【解决方案1】：

使用 awk 你可以做到：

awk '/hello|world/{printf "%s ", $0; next} 1' file
this is an original
file with lines
containing words like hello and world this is the end of the file

【讨论】：

+1;目前尚不清楚这是否是一项要求，但仅在单词边界上匹配可能是有意义的； Linux：/\<(hello|world)\>/； OSX（难以置信，awk 不支持那里的字边界匹配）：/(^|[[:punct:][:space:]])(hello|world)($|[[:punct:][:space:]])/
谢谢，但由于 OP 与 words 和 words 都匹配 word 我想不需要词边界。

【解决方案2】：

这是一个使用 sed 的简单方法

sed -r ':a;$!{N;ba};s/((hello|world)[^\n]*)\n/\1 /g' file

说明

:a;$!{N;ba} 将整个文件读入模式，像这样：this is an original\nfile with lines\ncontaining words like hell\ o\nand world\nthis is the end of the file$
s/((hello|world)[^\n]*)\n/\1 /g搜索关键字hello或world并删除下一个\n，
sed 替换中的g 命令代表将替换应用于正则表达式的所有匹配项，而不仅仅是第一个。

【讨论】：

+1 表示逻辑。尽管您缺少空格，...\n/\1 /g'
+1，非常聪明（尽管一次读取整个文件可能并不总是一种选择）。

【解决方案3】：

非正则表达式方法：

awk '
    BEGIN {
        # define the word list
        w["hello"]
        w["world"]
    }
    {
        printf "%s", $0
        for (i=1; i<=NF; i++) 
            if ($i in w) {
                printf " "
                next
            }
        print ""
    }
'

或 perl 单行代码

perl -pe 'BEGIN {@w = qw(hello world)} s/\n/ / if grep {$_ ~~ @w} split'

要就地编辑文件，请执行以下操作：

awk '...' filename > tmpfile && mv tmpfile filename
perl -i -pe '...' filename

【讨论】：

如何添加我想要在其中执行此操作的文件的名称？
把它放在命令的最后：perl -pe '...' filename

【解决方案4】：

这可能对你有用（GNU sed）：

sed -r ':a;/^.*(hello|world).*\'\''/M{$bb;N;ba};:b;s/\n/ /g' file

这将检查可能的多行中的最后一行是否包含所需的字符串，如果是，则读取另一行直到文件结束，或者最后一行不包含/那些字符串（ s)。删除换行符并打印行。

【讨论】：

+1，虽然不太容易理解； M（多行）修饰符和\'（缓冲区结束）断言（无意中被shell的引用要求混淆）的解释：gnu.org/software/sed/manual/sed.html#Addresses

【解决方案5】：

$ awk '{ORS=(/hello|world/?FS:RS)}1' file
this is an original
file with lines
containing words like hello and world this is the end of the file

【讨论】：

【解决方案6】：

sed -n '
:beg
/hello/ b keep
/world/ b keep
H;s/.*//;x;s/\n/ /g;p;b
: keep
H;s/.*//
$ b beg
' YourFile

由于检查当前行可能包含以前的 hello 或 world 已经有点困难

原则：

在每次模式匹配时，将字符串保存在保持缓冲区中否则，加载保持缓冲区并删除 \n （由于可用的缓冲区操作有限，使用交换并清空当前行）并打印内容在最后一行添加一个特殊的模式（通常保持，否则不打印）

【讨论】：