将列匹配到一个单独的文件并将匹配项附加到文件答案

【问题标题】：matching columns one separate files and appending matches to file将列匹配到一个单独的文件并将匹配项附加到文件
【发布时间】：2020-06-12 20:56:11
【问题描述】：

我正在尝试使用 awk 合并在单个列上过滤的两个文件。然后我想做的是将文件 2 中的相关列附加到文件 1 中。

用虚拟例子更容易解释。

文件1

name   fruit   animal
bob    apple   dog
jim    orange  cat
gary   mango   snake
daisy  peach   mouse

文件 2：

 animal number  shape
 cat    eight   square
 dog    nine    circle
 mouse  eleven  sphere

期望的输出：

 name   fruit   animal  shape   
 bob    apple   dog     circle
 jim    orange  cat     square
 gary   mango   snake   NA
 daisy  peach   mouse   sphere

第1步：需要过滤file1中的第3列和file2中的第1列

awk -F'\t' 'NR==FNR{c[$3]++;next};c[$1] > 0' file1 file2

这给了我输出：

cat    eight   square
dog    nine    circle
mouse eleven   sphere

这对我有所帮助，但是我不能简单地从上面的输出中剪切第三列（形状）并将其附加到 file1，因为在 file2 中没有“snake”条目。我需要能够将输出的第 3 列附加到匹配成功的文件 1 中，并且不放置“NA”。必须保留 file1 中的所有行，所以我不能忽略它们。这就是我卡住的地方！

如果有任何帮助，我将不胜感激...... E

【问题讨论】：

标签： awk

【解决方案1】：

您能否尝试根据 GNU awk 中显示的示例进行跟踪、编写和测试。

awk '
BEGIN{
  OFS="\t"
}
FNR==NR{
  a[$1]=$NF
  next
}
{
  print $0,($3 in a?a[$3]:"NA")
}'  Input_file2   Input_file1

说明：为上面添加详细说明。

awk '                               ##Starting awk program from here.
BEGIN{                              ##Starting BEGIN section from here.
  OFS="\t"                          ##Setting TAB as output field separator here.
}
FNR==NR{                            ##Checking condition FNR==NR which will be TRUE when first Input_file file2 is being read.
  a[$1]=$NF                         ##Creating array a with index $1 and value is $NF for current line.
  next                              ##next will skip all further statements from here.
}
{
  print $0,($3 in a?a[$3]:"NA")     ##Printing current line and checking if 3rd field is present in array a then print its value OR print NA.
}'  file2  file1                    ##Mentioning Input_file names here.

【讨论】：

你是英雄！谢谢！我现在应该能够弄清楚如何为 100 列和 30,000 行做我需要的事情：-D
@lecb，欢迎您在这个伟大的网站上快乐学习：)
我已经将相同的方法应用于更大的文件大小，它似乎被击中和错过了。当确实存在匹配时，我会得到一些“NA”（并且我已经交叉检查了我匹配的内容是否存在于两个文件和正确的列中）。不知道那里发生了什么！不幸的是无法共享数据。如果在上面的示例中，我想在匹配时从 file2 打印数字和形状怎么办？！
在上面，我已经解决了这个问题。 awk 脚本仅在您要匹配的列（在 file1 中）的前面列不为空时才有效。我设法解决了这个问题：awk 'BEGIN { FS = OFS = "\t" } { for(i=1; i<=NF; i++) if($i ~ /^ *$/) $i = "NA" }; 1' file >new_File 但是，我仍在试图弄清楚除了第三列之外，我将如何输出 file2 的第二列。我想也许：awk ' BEGIN{ OFS="\t" } FNR==NR{ a[$1]=$NF next } { print $0,$2,$3 in a?a[$3]:"NA" }' Input_file2 Input_file1这没用！
@lecb 如果您的输入是制表符分隔的，您应该在问题中说明这一点。如果某些字段可以为空，则应将其包含在问题的示例中。我们必须继续帮助您，只要您告诉我们您的问题。 wrt your comment 中的 2 个脚本 - 您在第一个脚本中发现您必须将 FS 设置为 \t ，然后您在第二个脚本中忘记了这样做，您正在读取 NR==FNR 块中的 file2所以用 $NF 节省 $2。