【问题标题】:bash - if two columns match then append columnbash - 如果两列匹配,则追加列
【发布时间】:2017-10-09 06:29:21
【问题描述】:

如果文件中已找到两个先前的字段,我正在尝试添加一列。

我有一个包含大量条目的逗号分隔文件,我需要在两列(第二列和第七列)上找到所有匹配的行。如果两者都在多行中找到,则添加第八列,说明“共享”。

文件内容:

WPC PROD LINUX O,1808,4194304000,10,3G,4G,66314
WPC PROD LINUX O,1809,3145728000,10,3G,4G,66314
WPC PROD LINUX O,1812,4194304000,10,3G,4G,66314
WPC PROD LINUX,1808,4194304000,10,1D,2D,66314
WPC PROD LINUX,1809,3145728000,10,1D,2D,66314
WPC PROD LINUX,1812,4194304000,10,1D,2D,66314
WPCESXCS40BP01_0,1808,4194304000,10,1D,2D,66314
WPCESXCS40BP01_0,1809,3145728000,10,1D,2D,66314
WPCESXCS40BP01_0,1812,4194304000,10,1D,2D,66314

所需的输出:

WPC PROD LINUX O,1808,4194304000,10,3G,4G,66314,shared
WPC PROD LINUX O,1809,3145728000,10,3G,4G,66314,shared
WPC PROD LINUX O,1812,4194304000,10,3G,4G,66314,shared
WPC PROD LINUX,1808,4194304000,10,1D,2D,66314,shared
WPC PROD LINUX,1809,3145728000,10,1D,2D,66314,shared
WPC PROD LINUX,1812,4194304000,10,1D,2D,66314,shared
WPCESXCS40BP01_0,1808,4194304000,10,1D,2D,66314,shared
WPCESXCS40BP01_0,1809,3145728000,10,1D,2D,66314,shared
WPCESXCS40BP01_0,1812,4194304000,10,1D,2D,66314,shared

我已搜索并找到此链接Awk - matching on 2 columns for differents lines,但它并不能完全满足我的需求,它只匹配以下行。

我可以这样做:

while IFS=',' read host device blk poolnum porta portb serial

    ldev_count=`cat outputtest.txt | grep -iw $device | grep -iw $serial | wc -l`
    if [[ $ldev_count > 1 ]] ; then
        echo "$host, $device, $blk, $poolnum, $porta, $portb, $serial, SHARED" >> semifinal.txt
    else
        echo "$host, $device, $blk, $poolnum, $porta, $portb, $serial" >> semifinal.txt
    fi
done < outputtest.txt

但它非常慢。我希望找到更好的解决方案。

感谢您的帮助。

为格式化而编辑

【问题讨论】:

  • 能否请您在此处突出显示第 2 列和第 7 列,我的意思是可能会有些混乱,因为我在您的问题中看不到这 2 列相同?能否请您突出显示它们?
  • 编辑了格式以提高可读性。
  • 好的,所以你是说如果列2 &amp; 7在任意两行之间共享(例如180866314),如果找到,你想将"shared"附加到末尾两条共享线路?
  • 正是,谢谢!

标签: bash awk


【解决方案1】:

你可能需要这个:

awk -F\, 'NR==FNR{a[$2]++;b[$7]++;next}
          a[$2]>1 && b[$7]>1{$(NF+1)="shared"}1' OFS=',' file file

结果:

WPC PROD LINUX O,1808,4194304000,10,3G,4G,66314,shared
WPC PROD LINUX O,1809,3145728000,10,3G,4G,66314,shared
WPC PROD LINUX O,1812,4194304000,10,3G,4G,66314,shared
WPC PROD LINUX,1808,4194304000,10,1D,2D,66314,shared
WPC PROD LINUX,1809,3145728000,10,1D,2D,66314,shared
WPC PROD LINUX,1812,4194304000,10,1D,2D,66314,shared
WPCESXCS40BP01_0,1808,4194304000,10,1D,2D,66314,shared
WPCESXCS40BP01_0,1809,3145728000,10,1D,2D,66314,shared
WPCESXCS40BP01_0,1812,4194304000,10,1D,2D,66314,shared

说明

我们将迭代文件两次

第一NR==FNR{a[$2]++;b[$7]++;next}

我们获取每一列的重复数据并将其存储在ab 数组中。

第二a[$2]&gt;1 &amp;&amp; b[$7]&gt;1{$(NF+1)="shared"}1

要过滤与您期望的代表数匹配的行,两列的此数字必须大于 1 才能添加新的结束列:$(NF+1)="shared"

注意:1 只是避免使用 print 语句的捷径。

【讨论】:

  • 这太完美了!谢谢一百万!
【解决方案2】:

您能否尝试关注一下,如果这对您有帮助,请告诉我。

awk -F, 'FNR==NR{a[$2,$7]++;next}  a[$2,$7]>1{print $0",shared"}'  Input_file  Input_file

输出如下。

WPC PROD LINUX O,1808,4194304000,10,3G,4G,66314,shared
WPC PROD LINUX O,1809,3145728000,10,3G,4G,66314,shared
WPC PROD LINUX O,1812,4194304000,10,3G,4G,66314,shared
WPC PROD LINUX,1808,4194304000,10,1D,2D,66314,shared
WPC PROD LINUX,1809,3145728000,10,1D,2D,66314,shared
WPC PROD LINUX,1812,4194304000,10,1D,2D,66314,shared
WPCESXCS40BP01_0,1808,4194304000,10,1D,2D,66314,shared
WPCESXCS40BP01_0,1809,3145728000,10,1D,2D,66314,shared
WPCESXCS40BP01_0,1812,4194304000,10,1D,2D,66314,shared

编辑:如果您想打印带有字符串“shared”的匹配行和不匹配行只需打印,那么以下内容可能对您有所帮助。

awk -F, '           ##Creating field delimiter as comma.
FNR==NR{            ##FNR==NR is a condition which will be TRUE when first Input_file is being read.
  a[$2,$7]++;       ##creating an array named a whose index is $2,$7(second and 7th field) and incrementing its value with 1 each time same elements come.
  next              ##Using next keyword will skip all further statements.
}
a[$2,$7]>1{         ##This condition will be TRUE only when 2nd Input_file is being read, check if array a value in index of $2,$7 is greater than 1.
  print $0",shared" ##Printing the current line with keyword shared at last of line.
  next;
}
1
' Input_file Input_file ##Mentioning the Input_file twice here.

【讨论】:

  • 这正是我所要求的。有没有办法也打印不匹配的行?
  • @LukeFowler,您能否检查一下我的编辑解决方案,如果这对您有帮助,请告诉我。
猜你喜欢
  • 1970-01-01
  • 2016-06-19
  • 1970-01-01
  • 2015-02-11
  • 2015-07-20
  • 1970-01-01
  • 1970-01-01
  • 2016-06-10
  • 2021-10-30
相关资源
最近更新 更多