在一列中查找重复，然后在另一列中减去值答案

【问题标题】：Find repeat in one column then subtract value in another column在一列中查找重复，然后在另一列中减去值
【发布时间】：2020-02-07 14:54:35
【问题描述】：

我的输入文件列是：

a   Otu1    w   4
b   Otu1    x   1
c   Otu2    y   12424
d   Otu3    z   1756

我想搜索第二列的每个重复，在第四列中减去它们的值。我想要的输出是：

a    Otu1   w   3
c   Otu2    y   12424
d   Otu3    z   1756

我在一个有两列的小文件中尝试了以下 awk 脚本

a    3
a    1
b    4

awk '$1 in a{print $1, a[$1]-$2} {a[$1]=$2}' small_input_file

这只是给我减去的价值

a    2

如何为我的四列输入文件修改此脚本？

谢谢。

【问题讨论】：

a-b 应该代表什么？它只是用连字符连接的列值吗？接下来会发生什么：“Otu1`？
我现在已经简化了所需的输出文件。谢谢

标签： awk

【解决方案1】：

双重扫描算法不会关心有多少记录或它们是否连续

$ awk 'NR==FNR  {a[$2]=$2 in a?a[$2]-$4:$4; next} 
       !b[$2]++ {print $1,$2,$3,a[$2]}' file{,}

a Otu1 w 3
c Otu2 y 12424
d Otu3 z 1756

【讨论】：

【解决方案2】：

这是一个以 awk 默认顺序输出的单遍：

$ awk '{
    if($2 in a)                  # current $2 met before
        b[$2]-=$4                # subtract $4
    else {                       # first time meet current $2
        a[$2]=$0                 # store record to a var
        b[$2]=$4                 # and $4 to another, key with $2
    }
}
END {                            # after processing
    for(i in a) {                # iterate all stored records
        sub(/[^ ]+$/,b[i],a[i])  # replace the last space separated string with the count
        print a[i]               # output
    }
}' file

输出顺序随机出现：

d   Otu3    z   1756
a   Otu1    w   3
c   Otu2    y   12424

【讨论】：