【发布时间】:2013-09-07 23:57:20
【问题描述】:
如何计算来自多个列的唯一字符串并仅使用 awk 显示它们的计数
我的输入文件c.txt:
US A one
IN A two
US B one
LK C one
US B two
US A three
IN A three
US B one
LK C two
US B three
US A one
IN A one
US B three
LK C three
US B two
US A two
IN A two
US B two
LK C three
US B two
US A one
IN A two
US B one
LK C one
US B two
US A three
IN A three
US B one
LK C two
US B three
US A one
IN A one
US B three
LK C three
US B two
US A two
IN A two
US B two
LK C three
US B two
US A one
IN A two
US B one
LK C one
US B two
US A three
IN A three
US B one
LK C two
US B three
US A one
IN A one
US B three
LK C three
US B two
US A two
IN A two
US B two
LK C three
US B two
US A one
IN A two
US B one
LK C one
US B two
US A three
IN A three
US B one
LK C two
US B three
US A one
IN A one
US B three
LK C three
US B two
US A two
IN A two
US B two
LK C three
US B two
US A one
IN A two
US B one
LK C one
US B two
US A three
IN A three
US B one
LK C two
US B three
US A one
IN A one
US B three
LK C three
US B two
US A two
IN A two
US B two
LK C three
US B two
我能够做到这一点,但分别使用 3 个命令,是否可以使用单个命令获得所有输出
awk '{a[$1]++}END{for (i in a)print i,a[i]}' c.txt
awk '{a[$1" "$2]++}END{for (i in a)print i,a[i]}' c.txt
awk '{a[$1" "$2" "$3]++}END{for (i in a)print i,a[i]}' c.txt
我想要的输出应该是:
IN 20 A 20 one 5
IN 20 A 20 three 5
IN 20 A 20 two 10
LK 20 C 20 one 5
LK 20 C 20 three 10
LK 20 C 20 two 5
US 60 A 20 one 10
US 60 A 20 three 5
US 60 A 20 two 5
US 60 B 40 one 10
US 60 B 40 three 10
US 60 B 40 two 20
第 2 列是输入文件第 1 列的总 uniq 值。
第 4 列是输入文件的第 1 列和第 2 列的总 uniq 值。
第 6 列是输入文件的第 1、2、3 列的总 uniq 值。
【问题讨论】:
-
请以较小的数据样本为例。我们不想向下滚动。
-
很好地提供了一个输入和输出示例,并发布了您已经尝试过的内容以及您期望发生的事情。在发布未来的问题stackoverflow.com/help/formatting 时,您会发现此格式指南很有帮助
标签: awk