在 awk 中:
$ awk '{a[$1]=a[$1](a[$1]==""?"":",")$2}END{for(i in a)print i,a[i]}' file
key1 1212,32332
key2 1212,3232,3232
解释:
awk '{ # use awk for this kind of stuff
a[$1]=a[$1] ( a[$1]=="" ? "" : "," ) $2 # hash on first col and append seconds
}
END { # after everything is hashed
for(i in a) # for each entry in hash a
print i,a[i] # output key and data
}' file # oh yeah the file
编辑:我们可以使用sort 对文件进行排序,然后在逗号之后输出键和所有数据,而不是让 awk 进行缓冲(即散列到 a)分开。后半部分再次使用 awk:
$ sort file | awk '$1!=p{printf "%s%s",(NR>1?ORS:""),$1}{printf "%s%s", ($1==p?",":OFS),$2;p=$1}END{print ""}'
key1 1212,32332
key2 1212,3232,3232
这里sort 没有给出任何花哨的参数,但在现实世界中可能需要一些参数。 awk部分解释:
sort file | \ # sort the file
awk ' # before feeding to awk
$1!=p { # if key is different from previous key
printf "%s%s",(NR>1?ORS:""),$1 # newline and print the key
}
{
printf "%s%s", ($1==p?",":OFS),$2 # print the data comma-separated
p=$1 # store key for comparing on the next round
}
END{
print "" # finish the last line nicely
}'