awk 根据字段匹配将特定字段添加到文件答案

【问题标题】：awk to add specific fields to file based on match in fieldawk 根据字段匹配将特定字段添加到文件
【发布时间】：2017-07-06 21:54:28
【问题描述】：

我正在尝试使用awk 将$4,$5,$6 字段和tab-delimeted file2 中的标题添加到file2 $2 中的行中file1 中的匹配 $3 值。我将 cmets 添加到每一行以及我对正在发生的事情的理解。谢谢你:)。

file1 tab-delimeted

ID  Name    Number
0-0 A,A 123456
2-2 B,B 789123
4-4 C,C 456789

file2 tab-delimeted

ID  Number  Name    Info1   Info2   Info3   Info4
0-0 123456  A,A aaaaa   bbbbb   ccccc   eeeee
1-1 111111  Z,Z aaa bbb ccc eee
2-2 789123  B,B aaaaa   bb,bbb  ccccc   eeeee
3-3 222222  Y,Y aaa bb,bb   cc  e
4-4 456789  C,C aaa bb  ccc eeee

想要的输出 tab-delimeted

ID  Name    Number  Info1   Info2   Info3
0-0 A,A 123456  aaaaa   bbbbb   ccccc
2-2 B,B 789123  aaaaa   bb,bbb  ccccc
4-4 C,C 456789  aaa bb  ccc

awk

awk -F"\t" '$3 in a{  # read $3 value of file1 into array a
 a[$3]=a[$2];   # match $3 array a from file1 with $2 value in file2
  next   # process next line
 }  # close block
  { print $1,$2,a[$2],$4,$5,$6  # print desired output
 }  # close block
    END {  # start block
 for ( i in a) {   # create for loop i to print
     print a[i]  # print for each matching line in i
  }  # close block
}' file1 file2

【问题讨论】：

获取 Arnold Robbins 所著的《Effective Awk Programming, 4th Edition》一书。您在这里已经回答了许多类似的问题（并且档案包含数百个），因此您不必问这个问题，除非您错过了可以从那本书中获得的基本知识。
我正在和其他几本书一起阅读那本书，并且正在学习，但这有点超出我的专业领域。我会继续阅读和尝试。谢谢大家的帮助、解释和耐心:)……这是一个陡峭的学习曲线，但在科学中非常有价值和需要。谢谢你:)。

标签： awk

【解决方案1】：

$ awk -v OFS="\t" 'NR==FNR{a[$3]=$0;next}$2 in a{print a[$2],$4,$5,$6}' file1 file2
ID      Name    Number  Info1   Info2   Info3
0-0     A,A     123456  aaaaa   bbbbb   ccccc
2-2     B,B     789123  aaaaa   bb,bbb  ccccc
4-4     C,C     456789  aaa     bb      ccc

解释：

$ awk -v OFS="\t" '         # tab as OFS also
NR==FNR{                    # for file1
    a[$3]=$0                # hash $0 to a using $3 as key
    next                    # no further processing for this record
}
$2 in a {                   # if $2 found in a
    print a[$2],$4,$5,$6    # output as requested
}' file1 file2              # mind the file order

【讨论】：

【解决方案2】：

尝试：另一种方法是先读取 file2，然后再读取 file1。

awk -F"\t" 'FNR==NR{a[$1,$3,$2]=$4 OFS $5 OFS $6;next} (($1,$2,$3) in a){print $1,$2,$3,a[$1,$2,$3]}' OFS="\t" file2 file1

将在几分钟内添加解释。

编辑：添加非单线形式的解决方案以及解释。

awk -F"\t" 'FNR==NR{                              ####Checking condition FNR==NR which will be only true when first file named file2 is being read. Because FNR and NR both represent the number of lines for a Input_file, only difference is FNR value will be RESET whenever it is starting to read next Input_file and NR value will be keep on increasing till all the Input_files are being read.
                a[$1,$3,$2]=$4 OFS $5 OFS $6;     ####Creating an array named a whose index is $1,$3 and $2 and value is $4,$5 and $6. Where OFS is output field separator, whose default value is space.
                next                              ####next is awk built-in keyword which will NOT allow cursor to go further and will skip all next statements.
            }
     (($1,$2,$3) in a){                           ####Checking a condition which will be only checked when 2nd Input_file is being read. So checking here if $1, $2 and $3 is present in array a, then do following.
                        print $1,$2,$3,a[$1,$2,$3]####print the value of $1, $2,$3 and array a value whose index is $1,$2 and $3.
                      }
    ' OFS="\t" file2 file1                        ####Mentioning the Input_files here.

【讨论】：