【问题标题】:awk match / compare / 2 columns for in each file (for 2 files)awk 匹配/比较/每个文件中的 2 列(对于 2 个文件)
【发布时间】:2023-03-11 10:20:01
【问题描述】:

在论坛上多次发现之后,我仍然坚持打印我想要的 2 个文件的输出

我想将 file1 与 file2 匹配并根据每个文件中的第一列和第二列将它们组合成一个文件。 (两个文件中的行均未排序); file1 中的第 5 列和 file2 中的第 3 列不是匹配的键(但如果它是强制性的,它也可以用作选项)

不确定是否最好的方法是通过 awk 从 2 diff 循环执行。文件夹,其中一个文件夹中有名为 SITEA 的文件;网站; SITEC 等.. 来自 file1 的 FILENAME 信息和在第二个文件夹中命名的相同文件,其中 FILENAME 信息包含来自 file2 。

如果每个文件中包含不匹配的内容以打印单词 EMPTY 并将其添加到所需的输出文件,则不确定是否可以添加。

文件1

SITEA 222 dummy dummy x8a7sdf dummyvalues dummyvalues
SITEA 357 dummy dummy x11x683 dummyvalues dummyvalues
SITEA 357 dummy dummy x11x69b dummyvalues dummyvalues
SITEA 357 dummy dummy x11x69d dummyvalues dummyvalues
SITEC 200 dummy dummy x11xdc1 dummyvalues dummyvalues
SITEA 357 dummy dummy x11x6bc dummyvalues dummyvalues
SITEA 200 dummy dummy x11x305 dummyvalues dummyvalues
SITEA 200 dummy dummy x11x323 dummyvalues dummyvalues
SITEA 357 dummy dummy x11x693 dummyvalues dummyvalues
SITEA 200 dummy dummy x11x300 dummyvalues dummyvalues
SITEB 357 dummy dummy x11x680 dummyvalues dummyvalues
SITEB 357 dummy dummy x11x688 dummyvalues dummyvalues
SITEA 151 dummy dummy x87f777 dummyvalues dummyvalues
SITEB 357 dummy dummy x11x68c dummyvalues dummyvalues
SITEA 200 dummy dummy x11x33b dummyvalues dummyvalues
SITEB 200 dummy dummy x11xf37 dummyvalues dummyvalues
SITEB 357 dummy dummy x11x68e dummyvalues dummyvalues
SITEB 357 dummy dummy x11x694 dummyvalues dummyvalues
SITEB 357 dummy dummy x11x6a5 dummyvalues dummyvalues
SITED 200 dummy dummy x11xdc0 dummyvalues dummyvalues
SITEB 200 dummy dummy x11xf36 dummyvalues dummyvalues
SITEB 200 dummy dummy x11xffd dummyvalues dummyvalues
SITEA 200 dummy dummy x11x306 dummyvalues dummyvalues
SITEA 200 dummy dummy x11x307 dummyvalues dummyvalues
SITEA 200 dummy dummy x11x325 dummyvalues dummyvalues
SITEB 357 dummy dummy x11x686 dummyvalues dummyvalues
SITEA 357 dummy dummy x11x680 dummyvalues dummyvalues
SITEA 200 dummy dummy x11x33c dummyvalues dummyvalues
SITEA 357 dummy dummy x11x6be dummyvalues dummyvalues
SITEA 357 dummy dummy x11x6ba dummyvalues dummyvalues
SITEB 200 dummy dummy x11xffe dummyvalues dummyvalues
SITEA 200 dummy dummy x11x33e dummyvalues dummyvalues
SITEB 200 dummy dummy x11xf00 dummyvalues dummyvalues
SITEB 357 dummy dummy x11x696 dummyvalues dummyvalues
SITEB 200 dummy dummy x11xf1c dummyvalues dummyvalues
SITEB 200 dummy dummy x11xf1e dummyvalues dummyvalues
SITEB 357 dummy dummy x11x69a dummyvalues dummyvalues
SITEB 200 dummy dummy x11xf34 dummyvalues dummyvalues
SITEB 200 dummy dummy x11xf35 dummyvalues dummyvalues
SITEB 200 dummy dummy x11xfff dummyvalues dummyvalues
SITEA 357 dummy dummy x11x681 dummyvalues dummyvalues
SITEA 200 dummy dummy x11x33d dummyvalues dummyvalues
SITEA 200 dummy dummy x11x33d dummyvalues dummyvalues
SITEA 100 dummy dummy x11x33d dummyvalues dummyvalues

文件2

SITEA 357 x11x683 dummyvalues dummyvalues dummyvalues
SITEA 200 x11x33b dummyvalues dummyvalues dummyvalues
SITEA 357 x11x693 dummyvalues dummyvalues dummyvalues
SITEA 357 x11x680 dummyvalues dummyvalues dummyvalues
SITEA 357 x11x69b dummyvalues dummyvalues dummyvalues
SITEB 357 x11x686 dummyvalues dummyvalues dummyvalues
SITEB 357 x11x6a5 dummyvalues dummyvalues dummyvalues
SITEA 357 x11x69d dummyvalues dummyvalues dummyvalues
SITEB 200 x11xffd dummyvalues dummyvalues dummyvalues
SITEA 357 x11x6ba dummyvalues dummyvalues dummyvalues
SITEB 357 x11x680 dummyvalues dummyvalues dummyvalues
SITEB 200 x11xf1c dummyvalues dummyvalues dummyvalues
SITEB 357 x11x68e dummyvalues dummyvalues dummyvalues
SITEB 357 x11x69a dummyvalues dummyvalues dummyvalues
SITEA 357 x11x681 dummyvalues dummyvalues dummyvalues
SITEA 200 x11x33c dummyvalues dummyvalues dummyvalues
SITEB 357 x11x694 dummyvalues dummyvalues dummyvalues
SITEB 357 x11x696 dummyvalues dummyvalues dummyvalues
SITEC 200 x11xdc1 dummyvalues dummyvalues dummyvalues
SITEA 357 x11x6bc dummyvalues dummyvalues dummyvalues
SITEB 200 x11xf37 dummyvalues dummyvalues dummyvalues
SITEA 200 x11x325 dummyvalues dummyvalues dummyvalues
SITED 200 x11xdc0 dummyvalues dummyvalues dummyvalues
SITEB 200 x11xf00 dummyvalues dummyvalues dummyvalues
SITEB 200 x11xf36 dummyvalues dummyvalues dummyvalues
SITEA 357 x11x6be dummyvalues dummyvalues dummyvalues
SITEA 200 x11x33d dummyvalues dummyvalues dummyvalues
SITEA 200 x11x305 dummyvalues dummyvalues dummyvalues
SITEB 357 x11x688 dummyvalues dummyvalues dummyvalues
SITEA 200 x11x33e dummyvalues dummyvalues dummyvalues
SITEB 200 x11xffe dummyvalues dummyvalues dummyvalues
SITEA 200 x11x300 dummyvalues dummyvalues dummyvalues
SITEB 200 x11xfff dummyvalues dummyvalues dummyvalues
SITEB 200 x11xf1e dummyvalues dummyvalues dummyvalues
SITEA 200 x11x306 dummyvalues dummyvalues dummyvalues
SITEA 200 x11x307 dummyvalues dummyvalues dummyvalues
SITEB 200 x11xf35 dummyvalues dummyvalues dummyvalues
SITEB 357 x11x68c dummyvalues dummyvalues dummyvalues
SITEB 200 x11xf34 dummyvalues dummyvalues dummyvalues
SITEA 200 x11x323 dummyvalues dummyvalues dummyvalues
SITEB 45 a8d7f99 dummyvalues dummyvalues dummyvalues
SITEB 008 8sd7f77 dummyvalues dummyvalues dummyvalues

想要的输出:

SITEA 357 dummy dummy x11x683 dummyvalues dummyvalues dummyvalues SITEA 357 x11x683 dummyvalues dummyvalues dummyvalues
SITEA 357 dummy dummy x11x69b dummyvalues dummyvalues dummyvalues SITEA 357 x11x69b dummyvalues dummyvalues dummyvalues
SITEA 357 dummy dummy x11x69d dummyvalues dummyvalues dummyvalues SITEA 357 x11x69d dummyvalues dummyvalues dummyvalues
SITEC 200 dummy dummy x11xdc1 dummyvalues dummyvalues dummyvalues SITEC 200 x11xdc1 dummyvalues dummyvalues dummyvalues
SITEA 357 dummy dummy x11x6bc dummyvalues dummyvalues dummyvalues SITEA 357 x11x6bc dummyvalues dummyvalues dummyvalues
SITEA 200 dummy dummy x11x305 dummyvalues dummyvalues dummyvalues SITEA 200 x11x305 dummyvalues dummyvalues dummyvalues
SITEA 200 dummy dummy x11x323 dummyvalues dummyvalues dummyvalues SITEA 200 x11x323 dummyvalues dummyvalues dummyvalues
SITEA 357 dummy dummy x11x693 dummyvalues dummyvalues dummyvalues SITEA 357 x11x693 dummyvalues dummyvalues dummyvalues
SITEA 200 dummy dummy x11x300 dummyvalues dummyvalues dummyvalues SITEA 200 x11x300 dummyvalues dummyvalues dummyvalues
SITEB 357 dummy dummy x11x680 dummyvalues dummyvalues dummyvalues SITEB 357 x11x680 dummyvalues dummyvalues dummyvalues
SITEB 357 dummy dummy x11x688 dummyvalues dummyvalues dummyvalues SITEB 357 x11x688 dummyvalues dummyvalues dummyvalues
SITEB 357 dummy dummy x11x68c dummyvalues dummyvalues dummyvalues SITEB 357 x11x68c dummyvalues dummyvalues dummyvalues
SITEA 200 dummy dummy x11x33b dummyvalues dummyvalues dummyvalues SITEA 200 x11x33b dummyvalues dummyvalues dummyvalues
SITEB 200 dummy dummy x11xf37 dummyvalues dummyvalues dummyvalues SITEB 200 x11xf37 dummyvalues dummyvalues dummyvalues
SITEB 357 dummy dummy x11x68e dummyvalues dummyvalues dummyvalues SITEB 357 x11x68e dummyvalues dummyvalues dummyvalues
SITEB 357 dummy dummy x11x694 dummyvalues dummyvalues dummyvalues SITEB 357 x11x694 dummyvalues dummyvalues dummyvalues
SITEB 357 dummy dummy x11x6a5 dummyvalues dummyvalues dummyvalues SITEB 357 x11x6a5 dummyvalues dummyvalues dummyvalues
SITED 200 dummy dummy x11xdc0 dummyvalues dummyvalues dummyvalues SITED 200 x11xdc0 dummyvalues dummyvalues dummyvalues
SITEB 200 dummy dummy x11xf36 dummyvalues dummyvalues dummyvalues SITEB 200 x11xf36 dummyvalues dummyvalues dummyvalues
SITEB 200 dummy dummy x11xffd dummyvalues dummyvalues dummyvalues SITEB 200 x11xffd dummyvalues dummyvalues dummyvalues
SITEA 200 dummy dummy x11x306 dummyvalues dummyvalues dummyvalues SITEA 200 x11x306 dummyvalues dummyvalues dummyvalues
SITEA 200 dummy dummy x11x307 dummyvalues dummyvalues dummyvalues SITEA 200 x11x307 dummyvalues dummyvalues dummyvalues
SITEA 200 dummy dummy x11x325 dummyvalues dummyvalues dummyvalues SITEA 200 x11x325 dummyvalues dummyvalues dummyvalues
SITEB 357 dummy dummy x11x686 dummyvalues dummyvalues dummyvalues SITEB 357 x11x686 dummyvalues dummyvalues dummyvalues
SITEA 357 dummy dummy x11x680 dummyvalues dummyvalues dummyvalues SITEA 357 x11x680 dummyvalues dummyvalues dummyvalues
SITEA 200 dummy dummy x11x33c dummyvalues dummyvalues dummyvalues SITEA 200 x11x33c dummyvalues dummyvalues dummyvalues
SITEA 357 dummy dummy x11x6be dummyvalues dummyvalues dummyvalues SITEA 357 x11x6be dummyvalues dummyvalues dummyvalues
SITEA 357 dummy dummy x11x6ba dummyvalues dummyvalues dummyvalues SITEA 357 x11x6ba dummyvalues dummyvalues dummyvalues
SITEB 200 dummy dummy x11xffe dummyvalues dummyvalues dummyvalues SITEB 200 x11xffe dummyvalues dummyvalues dummyvalues
SITEA 200 dummy dummy x11x33e dummyvalues dummyvalues dummyvalues SITEA 200 x11x33e dummyvalues dummyvalues dummyvalues
SITEB 200 dummy dummy x11xf00 dummyvalues dummyvalues dummyvalues SITEB 200 x11xf00 dummyvalues dummyvalues dummyvalues
SITEB 357 dummy dummy x11x696 dummyvalues dummyvalues dummyvalues SITEB 357 x11x696 dummyvalues dummyvalues dummyvalues
SITEB 200 dummy dummy x11xf1c dummyvalues dummyvalues dummyvalues SITEB 200 x11xf1c dummyvalues dummyvalues dummyvalues
SITEB 200 dummy dummy x11xf1e dummyvalues dummyvalues dummyvalues SITEB 200 x11xf1e dummyvalues dummyvalues dummyvalues
SITEB 357 dummy dummy x11x69a dummyvalues dummyvalues dummyvalues SITEB 357 x11x69a dummyvalues dummyvalues dummyvalues
SITEB 200 dummy dummy x11xf34 dummyvalues dummyvalues dummyvalues SITEB 200 x11xf34 dummyvalues dummyvalues dummyvalues
SITEB 200 dummy dummy x11xf35 dummyvalues dummyvalues dummyvalues SITEB 200 x11xf35 dummyvalues dummyvalues dummyvalues
SITEB 200 dummy dummy x11xfff dummyvalues dummyvalues dummyvalues SITEB 200 x11xfff dummyvalues dummyvalues dummyvalues
SITEA 357 dummy dummy x11x681 dummyvalues dummyvalues dummyvalues SITEA 357 x11x681 dummyvalues dummyvalues dummyvalues
SITEA 200 dummy dummy x11x33d dummyvalues dummyvalues dummyvalues SITEA 200 x11x33d dummyvalues dummyvalues dummyvalues
EMPTY EMPTY EMPTY EMPTY EMPTY EMPTY EMPTY EMPTY SITEB 45 a8d7f99 dummyvalues dummyvalues dummyvalues
EMPTY EMPTY EMPTY EMPTY EMPTY EMPTY EMPTY EMPTY SITEB 008 8sd7f77 dummyvalues dummyvalues dummyvalues
SITEA 151 dummy dummy x87f777 dummyvalues dummyvalues EMPTY EMPTY EMPTY EMPTY EMPTY EMPTY EMPTY
SITEA 222 dummy dummy x8a7sdf dummyvalues dummyvalues EMPTY EMPTY EMPTY EMPTY EMPTY EMPTY EMPTY
SITEA 200 dummy dummy x11x33d dummyvalues dummyvalues  EMPTY EMPTY EMPTY EMPTY EMPTY EMPTY EMPTY
SITEA 100 dummy dummy x11x33d dummyvalues dummyvalues  EMPTY EMPTY EMPTY EMPTY EMPTY EMPTY EMPTY

谢谢

只有在我将交换 file1 中的列并将第 5 列放在第 3 列旁边的情况下,我才部分使用此代码

awk 'NR==FNR{
  if(FNR==1){print}
  a[$1 $2 $3]=$0
  next
}
a[$1 $2 $3]!=$0 && a[$1 $2 $3]!=""{
  print a[$1 $2 $3],$0
}'  

但它也没有显示不匹配的行

【问题讨论】:

    标签: join awk compare match multiple-columns


    【解决方案1】:

    这样的?

    awk 'NR==FNR {a[$1,$2,$5]=$0; next}
                 {if(($1,$2,$3) in a) 
                    {print a[$1,$2,$3],$0; delete a[$1,$2,$3]}
                  else print "EMPTY",$0} 
         END     {for(k in a) print a[k], "EMPTY"} file1 file2
    

    第一个文件有 $5 部分的密钥,在存储值时使用它。

    如果您知道两个文件中的字段数量,请添加正确数量的“EMPTY”填充符,否则您也可以编写代码。为了简单起见,这里我省略了。

    file2 将规定输出的顺序。对于 file1 中的条目,但 file2 中缺少的条目将不会保留,如果需要,需要额外的逻辑。

    【讨论】:

    • 嗨@karakfa,经过一周的延迟,我像SITEA 222 x8a7sdf dummy dummy dummyvalues dummyvalues SITEA 200 x11x323 dummy dummy dummyvalues dummyvalues SITEA 200 x11x323 dummyvalues dummyvalues dummyvalues awk 'NR==FNR{if(FNR==1){print} a[$1 $2 $3]=$0 ;next} a[$1 $2 $3]!=$0 && a[$1 $2 $3]~!""{print a[$1 $2 $3],$0}' filea fileb 那样管理了这个,但缺少这部分打印输出SITEA 151 dummy dummy x87f777 dummyvalues dummyvalues EMPTY EMPTY EMPTY 将尝试重用您的部分代码和其他一些来源并将结果发布在这里
    • 我的坏@karakfa,重新运行后它就像一个魅力,你的作品更好+1,根据需要添加多个空块,它的作品就像一个魅力,例如awk 'NR==FNR {a[$1,$2,$5]=$0; next} {if(($1,$2,$3) in a) {print a[$1,$2,$3],$0; delete a[$1,$2,$3]} else print "EMPTY","EMPTY", $0} END {for(k in a) print a[k], "EMPTY","EMPTY","EMPTY"}' filea fileb 第 5 列忽略了你的部分
    猜你喜欢
    • 2019-09-08
    • 2023-03-27
    • 2012-10-18
    • 2019-05-24
    • 2020-09-22
    • 1970-01-01
    • 1970-01-01
    • 2019-11-11
    • 2018-10-23
    相关资源
    最近更新 更多