解释有关合并 csv 文件的 awk 命令答案

【问题标题】：Explain awk command about merging csv files解释有关合并 csv 文件的 awk 命令
【发布时间】：2014-12-01 17:27:40
【问题描述】：

我从一个和我有相同problem 的人那里找到了一个有用的awk command。

awk -F, 'NR==FNR{a[$2]=$0;next}$2 in a{ print a[$2],$4, $5 }' OFS=, file1.csv file2.csv

我正在尝试修改它以适应我们的 csv 格式，但我很难理解它的作用。很遗憾，我不得不在短时间内完成这项工作，希望你们能帮助我。

谢谢！

【问题讨论】：

标签： csv awk

【解决方案1】：

-F,          # Set the field separator to a comma

NR==FNR      # Test if we are looking the first file
             # NR is incremented for every line read across all input files
             # FNR is incremented for every line read in current file and resets to 0
             # The only time NR==FNR is when we are looking at the first file

a[$2]=$0     # Create a lookup for the line based on the value in the 2nd column

next         # Short circuit the script and get the next input line

$2 in a      # If we are here we are looking at the second file 
             # Check if we have seen the second field in the first file

a[$2],$4,$5  # Print the whole matching line from the first file
             # with the 4th & 5th fields from the second

OFS=,        # Separate the output with a comma

【讨论】：

wrt The only time NR==FNR is when we are looking at the first file 或者如果我们正在查看第二个文件并且第一个文件是空的。偶尔会绊倒人们的东西，记住这一点很好。
@EdMorton 最好使用ARGIND==1 并丢失两个字符
@Jidder 那是 GNU 特有的，但它是 gawk 的一个很好的解决方案。您可以为非 gawks 添加FNR==1{ARGIND++}，但这又会导致空文件失败。您也可以使用FILENAME==ARGV[1]，但如果您在文件名区域中设置变量会失败（还有一个不这样做的理由！）。确实没有理想的解决方案，所以NR==FNR 通常没问题，只需要记住，如果您的第一个文件为空，它将失败。
@EdMorton 在文件名区域设置变量是什么意思？
@Jidder: awk '...' OFS=, file vs awk -v OFS=, '...' file。

【解决方案2】：

-F,

将FS 设置为, 用于字段拆分。

NR==FNR{a[$2]=$0;next}

当当前处理的行号（NR）等于当前文件的行号（FNR）时（即处理第一个非空文件时）。将输入行存储到a 数组中，位于行的第二个字段 ($2) 的键下，然后跳至处理下一行 (next)。

$2 in a{ print a[$2],$4, $5 }

当当前行的第二个字段 ($2) 在数组 a 中时，在此键 a[$2] 后跟 OFS（逗号）后跟第四个字段打印数组中的字段当前行 ($4) 后跟 OFS 后跟当前行的字段 5 ($5)。

OFS=,

在处理输入文件之前将OFS 设置为,。

tl;dr 将来自file2.csv 的第四和第五列附加到来自file1.csv 的匹配行（基于字段二）。

【讨论】：