【问题标题】:awk command to filter three times or more data filterawk 命令过滤三次或更多数据过滤器
【发布时间】:2021-02-02 09:32:31
【问题描述】:

我有一个像这样分隔的数据集选项卡:

A  B  C  D
1  aaa 1 2
1  aaa 3 4
1  aaa 5 6
1  bbb 7 8
1  ccc 9 1
1  ccc 2 3
1  ddd 4 5
1  ddd 6 7
1  ddd 8 9
1  ddd 1 2

期望的输出:

A  B  C  D
1  aaa 1 2
1  aaa 3 4
1  aaa 5 6
1  ddd 4 5
1  ddd 6 7
1  ddd 8 9
1  ddd 1 2

我试过这个:

awk '++a[$2]>3' test.tsv test.tsv > test-2.tsv

不需要的输出:

1   ddd 1   2
1   aaa 1   2
1   aaa 3   4
1   aaa 5   6
1   ccc 2   3
1   ddd 4   5
1   ddd 6   7
1   ddd 8   9
1   ddd 1   2

【问题讨论】:

    标签: awk


    【解决方案1】:

    你可以试试这个 2 pass awk:

    awk -F '\t' 'FNR==NR {freq[$2]++; next} freq[$2] >= 3' test.tsv{,}
    
    1  aaa 1 2
    1  aaa 3 4
    1  aaa 5 6
    1  ddd 4 5
    1  ddd 6 7
    1  ddd 8 9
    1  ddd 1 2
    

    【讨论】:

      【解决方案2】:

      使用您展示的示例(单次输入文件),您能否尝试使用 GNU awk 进行跟踪、编写和测试。

      awk '
      BEGIN{ FS=OFS="\t" }
      FNR==1{
        print
        next
      }
      {
        count[$2]++
        line[$2]=(line[$2]?line[$2] ORS:"")$0
      }
      END{
        for(i in count){
          if(count[i]>=3){
             print line[i]
          }
        }
      }' Input_file
      

      说明:为上述添加详细说明。

      awk '                   ##Starting of awk program from here.
      BEGIN{ FS=OFS="\t" }    ##Starting BEGIN section of this program from here.
                              ##Setting FS and OFS as tab here.
      FNR==1{                 ##Checking condition if this is first line then do following.
        print                 ##Printing the current line here.
        next                  ##next will skip all further statements from here.
      }
      {
        count[$2]++           ##Creating count with index of 2nd field and keep increasing its count here.
        line[$2]=(line[$2]?line[$2] ORS:"")$0
                              ##Creating line array with index of 2nd field and keep adding lines to it with a new line.
      }
      END{                    ##Starting END block of this program from here.
        for(i in count){      ##Traversing through count array here.
          if(count[i]>=3){    ##Checking condition if count with index of i value is greater than or equals to 3 then do following.
             print line[i]    ##Printing value of line.
          }
        }
      }' Input_file           ##Mentioning Input_file name here.
      

      【讨论】:

        猜你喜欢
        • 1970-01-01
        • 1970-01-01
        • 2021-06-19
        • 1970-01-01
        • 1970-01-01
        • 2018-01-27
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        相关资源
        最近更新 更多