bash：使用另一个文件的结果计算数字答案

【问题标题】：bash : calculate number using result of another filebash：使用另一个文件的结果计算数字
【发布时间】：2015-11-19 02:12:37
【问题描述】：

我有两个文本日志文件，第一个 file1 格式为：

domain=yahoo       user=tom
domain=apple       user=mary
domain=apple       user=tom
domaine=facebook   user=kevin
    ...

第二个file2：

name=tony apply=yes
name=tony apply=yes
name=mary apply=yes
name=tony apply=yes
name=tom  apply=yes
...

现在我想从用户的域等于“yahoo”的第二个文件中获取总行数，我该怎么做？

【问题讨论】：

您的意思是user= 值等于domain= 值为yahoo 的另一个文件中的name= 值？
我会使用 awk 来完成这项工作。
@tripleee 是的，完全正确
你的预期输出是什么？
@anubhava 输出应该是一个数字（整数），对于我的示例，它应该为“yahoo”打印“1”，为“apple”打印“1”，因为“tom”不在文件 2 中

标签： bash

【解决方案1】：

# put the name that belongs to the domain=yahoo in a file
grep -n "domain=yahoo" file1 | cut -d = -f3 > result
# initialize counter
total=0
# loop over file with names and increment the counter with the number of matches
for name in $(cat result); do
    ((total+=$(grep -n "$name" file2 | wc -l)))
done

@drvtiny 为我指出了另一个（较短）版本的方向：

grep "domain=yahoo" file1 |     # search for the pattern \
    cut -d = -f3 |              # get the third field (name) from the output \
    grep -v '^$'  |             # remove any blank lines that might creep in \
    sort |                      # sort the result, so we can see duplicates \
    uniq > result               # remove duplicates and write the results to a file

grep --count --file=result file2

grep --file=FILE 将从提到的文件中获取模式，一行一个
--count 返回匹配的行数

【讨论】：

谢谢，但这不起作用，它显示“让：未找到”
它仍然无法正常工作，但你让我知道该怎么做，谢谢
太好了，你让它工作了！如果您有兴趣，请参阅this answer 了解如何使用不同的方式来增加变量。有些方法可能不适用于所有版本的 Bash。
你可以使用简单的 grep -c -f 结果来代替 for 循环！而且你必须在cut -d = -f3之后进行排序和uniq，否则你会在结果文件中得到很多重复。
@drvtiny 感谢您提出改进建议！我已经更新了我的答案，你怎么看？

【解决方案2】：

不规则的输入文件格式使事情稍微复杂化。通常，您要么拥有以指定顺序排列的静态列数，或者以任意顺序拥有一系列关键字=值对。假设列是固定的，我们可以通过简单的substr() 调用来简单地忽略关键字= 部分。

awk 'NR==FNR && $1 == "domain=yahoo" { a[substr($2,6)]=1; next }
    substr($1,6) in a { x++; next } END { print x }' file1 file2

脚本的一般结构是常见NR==FNR 习惯用法的简单实现。 Stack Overflow上有大量这样的问题；参见例如以Comparing fields of two files in awk 为例。

【讨论】：

yahoo 的输出为 1，apple 的输出为 2。我看不出您的示例文件如何为 apple 生成 1？

【解决方案3】：

我的 /tmp/log 是：

domain=yahoo       user=tom
domain=apple       user=mary
domain=apple       user=tom
domain=facebook   user=kevin
domain=apple   user=tony

我的 /tmp/name 是：

name=tony apply=yes
name=tony apply=yes
name=mary apply=yes
name=tony apply=yes
name=tom  apply=yes
name=kevin  apply=yes

script.sh 是：

#!/bin/bash
DOMAIN=$1
LOG_FILE=$2
NAME_FILE=$3
REGEXP="$(sed -nr "/domain=$DOMAIN/s%^.*user=([^[:space:]]+).*$%\1%p" "$LOG_FILE" | sort | uniq | tr '\n' '|' | sed 's%|$%%')"
[[ $REGEXP ]] || {
 echo 'No such domain in log file' >&2
 exit 1
}
egrep -c "name=(${REGEXP})([[:space:]]|$)" "$NAME_FILE"

使用参数运行脚本：

./script.sh apple /tmp/log /tmp/name

你会得到这个结果：

【讨论】：