【问题标题】:Print all lines from a file containing strings from another file with sed [duplicate]使用 sed 打印包含来自另一个文件的字符串的文件中的所有行 [重复]
【发布时间】:2015-12-06 05:24:58
【问题描述】:

我有一个包含一堆字符串的文件。我有另一个包含一堆单词的文件。我想打印第一个文件中包含第二个文件前二十个单词之一的所有行。我一直在尝试用 sed 来做这件事,但是 grep 或 awk 会是更好的选择吗?

【问题讨论】:

  • 寻找 Qs re。 fgrepgrep -Ff 。祝你好运。

标签: bash awk sed grep


【解决方案1】:

问题是关于“单词”的......而且......我想了很多关于这意味着什么,同时试图对 file2 的格式做出尽可能少的假设——认为 file2 可能是另一本书,可能是一个短语,也可能是一个逗号或制表符分隔的列表。

  • 我们可能希望匹配整个单词,使得 file2 中的“home”与 file1 中的“homely”不匹配。
  • 带有数字、破折号、加号等的字符串不是英文单词,不应考虑。
  • 应保留连字符和所有格。
  • 当我们匹配“单词”时,应忽略大小写(此功能很容易反转)

如果允许我们对 file2 的格式进行限制,请阅读最后的简化 egrep/sed 脚本答案。

以下答案首先在子 shell 中对 file2 进行操作,处理标点符号和分隔符,识别前 20 个有效单词,然后从有效单词列表中构建正则表达式。然后脚本应用正则表达式(子shell的结果)过滤file1。

egrep -i $(tr -c "[:alnum:]-'" '\n' < file2 | awk "/^[[:alpha:]]+(-[[:alpha:]]+)?('s|s')?$/ { print; i++ } i==20 { exit 0 }" | sed '1h; 1!H; $!d; g; s/\n/ /g; s/^/\\</; s/ /\\>|\\</g; s/$/\\>/') file1

进一步解释......如果我们有以下文件2作为我们的例子:

$ cat file2
1The quick brown fox
jumps over- Frank's (empty-headed) lazy dog.

子shell管道中的tr语句过滤掉不需要的分隔符并将候选词放在返回分隔列表中:

$ tr -c "[:alnum:]-'" '\n' < file2
1The
quick
brown
fox
jumps
over-
Frank's

empty-headed

lazy
dog

子外壳管道中的 awk 语句过滤有效单词并打印最多 20 个单词。

$ tr -c "[:alnum:]-'" '\n' < file2 | awk "/^[[:alpha:]]+(-[[:alpha:]]+)?('s|s')?$/ { print; i++ } i==20 { exit 0 }"
quick
brown
fox
jumps
Frank's
empty-headed
lazy
dog

子外壳管道中的最后一条语句将单词列表格式化为正则表达式。

$ tr -c "[:alnum:]-'" '\n' < file2 | awk "/^[[:alpha:]]+(-[[:alpha:]]+)?('s|s')?$/ { print; i++ } i==20 { exit 0 }" | sed '1h; 1!H; $!d; g; s/\n/ /g; s/^/\\</; s/ /\\>|\\</g; s/$/\\>/'
\<quick\>|\<brown\>|\<fox\>|\<jumps\>|\<Frank's\>|\<empty-headed\>|\<lazy\>|\<dog\>

如果我们使用 egrep 用这个表达式过滤一个众所周知的文本:

$ egrep -i "\<quick\>|\<brown\>|\<fox\>|\<jumps\>|\<Frank's\>|\<empty-headed\>|\<lazy\>|\<dog\>" kjv.txt | head -n 5
Ge30:32 I will pass through all thy flock to day, removing from thence all the speckled and spotted cattle, and all the brown cattle among the sheep, and the spotted and speckled among the goats: and of such shall be my hire.
Ge30:33 So shall my righteousness answer for me in time to come, when it shall come for my hire before thy face: every one that is not speckled and spotted among the goats, and brown among the sheep, that shall be counted stolen with me.
Ge30:35 And he removed that day the he goats that were ringstraked and spotted, and all the she goats that were speckled and spotted, and every one that had some white in it, and all the brown among the sheep, and gave them into the hand of his sons.
Ge30:40 And Jacob did separate the lambs, and set the faces of the flocks toward the ringstraked, and all the brown in the flock of Laban; and he put his own flocks by themselves, and put them not unto Laban's cattle.
Exo11:7 But against any of the children of Israel shall not a dog move his tongue, against man or beast: that ye may know how that the LORD doth put a difference between the Egyptians and Israel.

把它们放在一起......

egrep -i $(tr -c "[:alnum:]-'" '\n' < file2 | awk "/^[[:alpha:]]+(-[[:alpha:]]+)?('s|s')?$/ { print; i++ } i==20 { exit 0 }" | sed '1h; 1!H; $!d; g; s/\n/ /g; s/^/\\</; s/ /\\>|\\</g; s/$/\\>/') kjv.txt | head -n 5
Ge30:32 I will pass through all thy flock to day, removing from thence all the speckled and spotted cattle, and all the brown cattle among the sheep, and the spotted and speckled among the goats: and of such shall be my hire.
Ge30:33 So shall my righteousness answer for me in time to come, when it shall come for my hire before thy face: every one that is not speckled and spotted among the goats, and brown among the sheep, that shall be counted stolen with me.
Ge30:35 And he removed that day the he goats that were ringstraked and spotted, and all the she goats that were speckled and spotted, and every one that had some white in it, and all the brown among the sheep, and gave them into the hand of his sons.
Ge30:40 And Jacob did separate the lambs, and set the faces of the flocks toward the ringstraked, and all the brown in the flock of Laban; and he put his own flocks by themselves, and put them not unto Laban's cattle.
Exo11:7 But against any of the children of Israel shall not a dog move his tongue, against man or beast: that ye may know how that the LORD doth put a difference between the Egyptians and Israel.

该解决方案在我用了一年的笔记本电脑上运行得相当快:

$ wc -lw kjv.txt 
  31102  820736 kjv.txt
$ time egrep -i $(tr -c "[:alnum:]-'" '\n' < file2 | awk "/^[[:alpha:]]+(-[[:alpha:]]+)?('s|s')?$/ { print; i++ } i==20 { exit 0 }" | sed '1h; 1!H; $!d; g; s/\n/ /g; s/^/\\</; s/ /\\>|\\</g; s/$/\\>/') kjv.txt > /dev/null

real    0m0.021s
user    0m0.016s
sys     0m0.000s

简化答案

以上是针对 file2 是“嘈杂”的复杂情况...如果 file2 被定义为返回分隔的单词列表,那么答案是什么——我们不必检查有效的单词?然后我们可以去掉前面的子shell管道的前两个阶段:

egrep -i $(head -n20 file2 | sed '1h; 1!H; $!d; g; s/\n/ /g; s/^/\\</; s/ /\\>|\\</g; s/$/\\>/') file1

最后,如果约束与前面的约束相同并且file2中的单词列表是单个空格分隔的,那么解决方案是什么?

egrep -i $(awk 'NF>20{NF=20}1' file2 | sed 's/^/\\</; s/ /\\>|\\</g; s/$/\\>/') file1

【讨论】:

    【解决方案2】:

    解决方案:

    sed 20q file2 > temp grep -f temp file1

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 2014-09-10
      • 1970-01-01
      • 1970-01-01
      • 2011-08-14
      • 2014-05-25
      • 2013-05-10
      • 2018-12-27
      相关资源
      最近更新 更多