【问题标题】:Find match pattern and delete the first occurrence查找匹配模式并删除第一个匹配项
【发布时间】:2020-12-05 14:52:43
【问题描述】:

我有一个文件1

NR2SKRD12BWP210H6P51CNODSVT-(.A1(n7),.A2(),.ZN(n8)); |2
MUX2D2BWP210H6P51CNODSVT-(.I0(n8),.I1(),.S(),.Z(n9)); |4
CKLHQD16BWP210H6P51CNODSVT-(.CPN(#),.E(1),.TE(n9),.Q(n10)); |5
LHCSNQD1BWP210H6P51CNODSVT-(.CDN(n10),.D(),.E(1),.SDN(),.Q(n11)); |6
OAI21D8BWP210H6P51CNODSVT-(.A1(n11),.A2(),.B(),.ZN(n12)); |9
DCCKND16BWP210H6P51CNODSVT-(.I(n12),.ZN(n13)); |10
INVSKFD14BWP210H6P51CNODSVT-(.I(n13),.ZN(n14)); |11
NR2SKRD12BWP210H6P51CNODSVT-(.A1(n7),.A2(n1),.ZN(n8)); |2
MUX2D2BWP210H6P51CNODSVT-(.I0(n8),.I1(n2),.S(),.Z(n9)); |4

我需要在 file1 中找到匹配的行,即第一个字段。字段由- 分隔,如果找到匹配,则删除第一个匹配行。

我想输出为

CKLHQD16BWP210H6P51CNODSVT-(.CPN(#),.E(1),.TE(n9),.Q(n10)); |5
LHCSNQD1BWP210H6P51CNODSVT-(.CDN(n10),.D(),.E(1),.SDN(),.Q(n11)); |6
OAI21D8BWP210H6P51CNODSVT-(.A1(n11),.A2(),.B(),.ZN(n12)); |9
DCCKND16BWP210H6P51CNODSVT-(.I(n12),.ZN(n13)); |10
INVSKFD14BWP210H6P51CNODSVT-(.I(n13),.ZN(n14)); |11
NR2SKRD12BWP210H6P51CNODSVT-(.A1(n7),.A2(n1),.ZN(n8)); |2
MUX2D2BWP210H6P51CNODSVT-(.I0(n8),.I1(n2),.S(),.Z(n9)); |4

这里NR2SKRD12BWP210H6P51CNODSVTMUX2D2BWP210H6P51CNODSVT 有相同的 1 美元。所以删除他们的第一条匹配线。

我试过代码

awk -F'-' 'FNR==NR{a[$1];next} !(($1) in a)' file1

但是这段代码是在两个文件之间查找匹配和删除行。如何找到单个文件的匹配和删除。 *仅删除第一条匹配行。保持第二,第三,第四等重复。

【问题讨论】:

  • 处理一个文件时,FNR==NR 背后的逻辑是什么?
  • 删除第一次出现。仅如果有第三次第四次重复,则保留它们。 @rowboat

标签: awk sed


【解决方案1】:

您能否尝试在 GNU awk 中使用所示示例进行跟踪、编写和测试。

awk '
BEGIN{ FS="-" }
FNR==NR{
  arr[$1]++
  next
}
arr[$1]>1 && ++arrAgain[$1]==1{ next }
1
' Input_file Input_file

说明:为上述添加详细说明。

awk '                             ##Starting awk program from here.
BEGIN{ FS="-" }                   ##Setting field separator as dash here.
FNR==NR{                          ##Checking FNR==NR condition which will be TRUE when 1st time Input_file is being read.
  arr[$1]++                       ##Creating array arr with 1st field index and keep increasing its value with 1 on each of its occurrence.
  next                            ##next will skip all further statements from here.
}
arr[$1]>1 && ++arrAgain[$1]==1{   ##Checking if arr value with 1st field index is greater than 1 and its first time occurring in arrAgain then skip that line.
  next                            ##next will skip all further statements from here.
}
1                                 ##1 will print current line.
' Input_file Input_file           ##Mentioning Input_file names here.

【讨论】:

    【解决方案2】:

    这可能对你有用(GNU sed):

    sed -E 'H;x;s/^(\n[^-]*-)[^\n]*(.*\1)/\2/;x;$!d;x;s/.//' file
    

    在保留空间中复制当前行。

    如果当前键已经存在于保持空间中,则删除第一行。

    在文件末尾,交换到保留空间,删除复制时引入的第一个换行符并打印结果。

    【讨论】:

      【解决方案3】:

      另一个awk

      $ awk -F- 'NR==FNR{a[$1]++; next} !(--a[$1])' file{,}
      
      CKLHQD16BWP210H6P51CNODSVT-(.CPN(#),.E(1),.TE(n9),.Q(n10)); |5
      LHCSNQD1BWP210H6P51CNODSVT-(.CDN(n10),.D(),.E(1),.SDN(),.Q(n11)); |6
      OAI21D8BWP210H6P51CNODSVT-(.A1(n11),.A2(),.B(),.ZN(n12)); |9
      DCCKND16BWP210H6P51CNODSVT-(.I(n12),.ZN(n13)); |10
      INVSKFD14BWP210H6P51CNODSVT-(.I(n13),.ZN(n14)); |11
      NR2SKRD12BWP210H6P51CNODSVT-(.A1(n7),.A2(n1),.ZN(n8)); |2
      MUX2D2BWP210H6P51CNODSVT-(.I0(n8),.I1(n2),.S(),.Z(n9)); |4
      

      双重扫描文件,第一轮计算每个键的出现次数,第二轮只打印最后一个。

      【讨论】:

        【解决方案4】:

        删除第一个副本:

        awk -F- 'NR==FNR {++a[$1]; next} a[$1]==1; {a[$1]=1}' file file
        

        读取同一个文件两次。在第一次阅读时数 1 美元,在下次阅读时决定如何处理。

        【讨论】: