Bash：如何计算文件中字符串的出现次数？答案

【问题标题】：Bash: How to count the number of occurrences of a string within a file?Bash：如何计算文件中字符串的出现次数？
【发布时间】：2020-05-03 16:27:12
【问题描述】：

我有一个看起来像这样的文件：

dog
cat
dog
dog
fish
cat

我想在 Bash 中编写一些代码来使文件格式如下：

dog:1
cat:1
dog:2
dog:3
fish:1
cat:2

知道如何做到这一点吗？该文件非常大（> 30K 行），因此代码应该有点快。

我在想某种循环......

像这样：

while read line; 
     echo "$line" >> temp.txt
     val=$(grep $line temp.txt)
     echo "$val" >> temp2.txt
done < file.txt

然后paste -d ':' file1.txt temp2.txt

但是，我担心这会很慢，因为您要逐行进行。其他人怎么看？

【问题讨论】：

请展示你的尝试
刚刚更新了原来的问题！
是的，它会非常慢，并且由于部分匹配会产生不正确的值，并且它还有其他问题，例如它会破坏某些输入，会根据输入值和运行它的目录等。请参阅 why-is-using-a-shell-loop-to-process-text-considered-bad-practice。
这能回答你的问题吗？ use bash count every word's occurrence in a file

标签： string bash awk grep

【解决方案1】：

您可以使用这个简单的awk 为您完成这项工作：

awk '{print $0 ":" ++freq[$0]}' file

dog:1
cat:1
dog:2
dog:3
fish:1
cat:2

【讨论】：

【解决方案2】：

这是我想出的：

declare -A arr; while read -r line; do ((arr[$line]++)); echo "$line:${arr[$line]}" >> output_file; done < input_file

首先，声明哈希表 arr。然后在 for 循环中读取每一行，并使用读取行的键递增数组中的值。然后回显该行，然后是哈希表中的值。最后追加到文件'out'中。

【讨论】：

对于中等大小的输入文件（比等效的 awk 脚本慢几个数量级），这将需要很长时间才能运行，并且对于 shell 来说不是一个好的应用程序。见why-is-using-a-shell-loop-to-process-text-considered-bad-practice

【解决方案3】：

Awk 或 sed 非常强大，但它不是 bash，这里是 bash 变体

raw=( $(cat file) ) # read file
declare -A index    # init indexed array

for item in ${raw[@]}; { ((index[$item]++)); } # 1st loop through raw data to count items
for item in ${raw[@]}; { echo $item:${index[$item]}; } # 2nd loop change data

【讨论】：