迭代 Bash 脚本错误答案

【问题标题】：Iterative Bash Script Bug迭代 Bash 脚本错误
【发布时间】：2018-03-16 06:58:17
【问题描述】：

使用 bash 脚本，我尝试逐行遍历只有大约 700 个单词的文本文件，并在当前目录中使用特定文件上的单词运行不区分大小写的 grep 搜索。为了分解它，我试图将以下内容输出到文件中：

在文件中添加一个换行符，然后是搜索的单词，然后是另一个换行符

使用该搜索附加 grep 命令的结果

重复步骤 1 和 2，直到用尽列表中的所有单词

例如，如果我有这个 list.txt：

search1
search2

我希望 results.txt 是：

search1:
grep result here

search2:
grep result here

我在整个堆栈交换中找到了一些关于如何做到这一点的答案，并提出了以下实现：

#!/usr/bin/bash

while IFS = read -r line;
do 
    "\n$line:\n" >> "results.txt";
    grep -i "$line" *.in >> "results.txt";
done < "list.txt"

但是，由于某种原因，这（以及我尝试过的众多变体）不起作用。看起来微不足道，但我会让我难以置信。任何帮助表示赞赏。

【问题讨论】：

是grep result here 只是包含模式searchX 的文件名列表吗？你还需要行号吗？
shellcheck.net 是快速查找 shell 脚本问题的好选择
echo -e 解释 \n 换行符 :)

标签： linux bash shell awk grep

【解决方案1】：

如果您将脚本更改为：

while IFS= read -r line; do
    printf '\n%s:\n' "$line"
    grep -i "$line" *.in
done < list.txt > results.txt

但它会非常慢。请参阅https://unix.stackexchange.com/questions/169716/why-is-using-a-shell-loop-to-process-text-considered-bad-practice，了解为什么在编写 shell 循环只是为了操作文本之前应该仔细考虑。用于操作文本的标准 UNIX 工具是 awk：

awk '
NR==FNR { words2matches[$0]; next }
{
    for (word in words2matches) {
        if ( index(tolower($0),tolower(word)) ) {
            words2matches[word] = words2matches[word] $0 ORS
        }
    }
}
END {
    for (word in words2matches) {
        print word ":" ORS words2matches[word]
    }
}
' list.txt *.in > results.txt

上述内容当然未经测试，因为您没有提供我们可以测试的示例输入/输出。

【讨论】：

我遇到的一个问题是 $line 变量包含换行符，但找到了一种将其格式化的方法。感谢您的详细解决方案！
$line 变量不可能包含换行符，因为 shell 循环一次将一行读入该变量。它可能包含回车，例如如果您的输入文件是在 Windows 机器上生成的，您可以通过在文件上运行 dos2unix 或类似内容来删除这些文件。很高兴它有帮助！

【解决方案2】：

可能的问题：

bash 路径 - 使用 /bin/bash 路径而不是 /usr/bin/bash
空格 - 在IFS 之后删除' '
echo - 使用 -e 选项处理转义字符（此处：'\n'）
分号 - 行尾不需要

试试下面的脚本：

#!/bin/bash

while IFS= read -r line; do
    echo -e "$line:\n" >> "results.txt"
    grep -i "$line" *.in >> "results.txt"
done < "list.txt"

【讨论】：

【解决方案3】：

您甚至不需要为此编写 bash 脚本：

输入文件：

$ more file?.in
::::::::::::::
file1.in
::::::::::::::
abc
search1
def
search3
::::::::::::::
file2.in
::::::::::::::
search2
search1
abc
def
::::::::::::::
file3.in
::::::::::::::
abc
search1
search2
def
search3

图案文件：

$ more patterns 
search1
search2
search3

CMD：

$ grep -inf patterns file*.in | sort -t':' -k3 | awk -F':' 'BEGIN{OFS=FS}{if($3==buffer){print $1,$2}else{print $3; print $1,$2}buffer=$3}'

输出：

search1
file1.in:2
file2.in:2
file3.in:2
search2
file2.in:1
file3.in:3
search3
file1.in:4
file3.in:5

解释：

由于-f 选项，grep -inf patterns file*.in 将 grep 所有文件*.in 以及位于模式文件中的所有模式，使用-i 强制不区分大小写，-n 将添加行号
sort -t':' -k3 您使用第三列对输出进行排序以将模式重新组合在一起
awk -F':' 'BEGIN{OFS=FS}{if($3==buffer){print $1,$2}else{print $3; print $1,$2}buffer=$3}' 然后awk 将使用: 作为字段分隔符和输出字段分隔符打印您想要的显示，您使用缓冲区变量来保存模式（第三个字段）并在模式更改时打印（@ 987654333@)

【讨论】：

问题在于 grep 没有输出它正在搜索的字符串，它输出的是与它正在搜索的字符串匹配的行。它们恰好在您的示例输入中是相同的，但请尝试在您的一个输入文件中将 search1 更改为 search1stuff 以了解我的意思。