【问题标题】:How to Compare two file in unix and merge file 1 and 2 which includes both matching and non matching data如何比较 unix 中的两个文件并合并包含匹配和不匹配数据的文件 1 和 2
【发布时间】:2017-05-24 14:52:01
【问题描述】:

请告诉我如何比较 2 个文件并合并匹配和不匹配的行。我已经检查了之前提供的所有答案,但没有一个符合我的要求。请在下面找到示例数据集

file1.csv的内容:

J2D     TEXAS        43988
J2D     AUSTIN       21305
J2D     CUPERTINO    378563
J2D     BELGIUM      569632
J2D     UK            0
J2D     US            8
J2D     INDIA         75321

file2.csv的内容:

J2D     TEXAS        25463
J2D     AUSTIN       5986
J2D     CUPERTINO    0234
J2D     BELGIUM      123468
J2D     UK            5874
J2D     US            2365
J2D     IRAQ          8982

我尝试了以下命令,但在我的场景中不起作用:

awk 'NR==FNR{a[$2]=$3;next;}{print $0 "    " ($2 in a ? a[$2] : "NA")}' file2.csv file1.csv

输出:

J2D     TEXAS        43988    25463
J2D     AUSTIN       21305    5986
J2D     CUPERTINO    378563    0234
J2D     BELGIUM      569632    123468
J2D     UK            0        5874
J2D     US            8        2365
J2D     INDIA         75321    NA

在上述结果中,您可以看到file2.csv 中的“IRAQ”缺失。

awk 'NR==FNR{a[$2]=$3;next;}{print $0 "    " ($2 in a ? a[$2] : "NA")}' file1.csv file2.csv

输出:

J2D     TEXAS        25463    43988
J2D     AUSTIN       5986    21305
J2D     CUPERTINO    0234    378563
J2D     BELGIUM      123468    569632
J2D     UK            5874    0
J2D     US            2365    8
J2D     IRAQ          8982    NA

在上述结果中,您可以看到来自file1.csv 的“INDIA”缺失

以下是预期的输出。请帮助我获得所需的输出

预期输出:

J2D     TEXAS      43988      25463
J2D     AUSTIN     21305      5986
J2D     CUPERTINO  378563     0234
J2D     BELGIUM    569632     123468
J2D     UK          0         5874
J2D     US          8         2365
J2D     INDIA       75321     NA
J2D     IRAQ        NA        8982

【问题讨论】:

  • 首先格式化您的问题以使其可读
  • 嗨 Roman,我已经格式化了这个问题。请在这方面帮助我
  • 如果输出按第二个字段排序,您可以吗?
  • 是的,可以排序...还有一件事,前 2 列是我场景中的组合键..

标签: linux shell unix awk


【解决方案1】:

粘贴 + awk解决方案:

paste file1.csv file2.csv | awk '{ if($2==$5) { print $1,$2,$3,$6 }
      else { print $1,$2,$3,"NA","\n",$4,$5,"NA",$6 }}' | column -tx

输出:

J2D  TEXAS      43988   25463
J2D  AUSTIN     21305   5986
J2D  CUPERTINO  378563  0234
J2D  BELGIUM    569632  123468
J2D  UK         0       5874
J2D  US         8       2365
J2D  INDIA      75321   NA
J2D  IRAQ       NA      8982

详情

  • paste file1.csv file2.csv - 合并文件行

  • if($2==$5) { print $1,$2,$3,$6 } - 如果文件与第二列匹配($5 字段指向file2.csv早期第二列)

  • print $1,$2,$3,"NA","\n",$4,$5,"NA",$6 - 将未处理的行打印为单独的行,NA 在相对位置

http://man7.org/linux/man-pages/man1/paste.1.html

【讨论】:

    【解决方案2】:

    awk 来救援!

    $ awk     '{k=$1 FS $2} 
       NR==FNR {a[k]=$3; next} 
               {print $0, (k in a)?a[k]:"NA"; delete a[k]} 
           END {for(k in a) print k,"NA",a[k]}' file2 file1 | column -t
    
    J2D  TEXAS      43988   25463
    J2D  AUSTIN     21305   5986
    J2D  CUPERTINO  378563  0234
    J2D  BELGIUM    569632  123468
    J2D  UK         0       5874
    J2D  US         8       2365
    J2D  INDIA      75321   NA
    J2D  IRAQ       NA      8982
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 2012-09-05
      • 1970-01-01
      • 2014-02-21
      • 2017-10-25
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2012-10-18
      相关资源
      最近更新 更多