【问题标题】:AWK for sorting a field based on same multiple fields用于基于相同的多个字段对字段进行排序的 AWK
【发布时间】:2018-03-16 14:27:13
【问题描述】:

我有一个如下文件:

scale_check BANK0_F2_WRDAT_P0[0] MCLK[0] LOW H2L 0.867568 SETUP
scale_check BANK0_F2_WRDAT_P0[0] MCLK[0] HIGH H2L 2.3 SETUP
scale_check BANK0_F2_WRDAT_P0[0] MCLK[0] HIGH H2L 2.3 SETUP
scale_check BANK0_F2_WRDAT_P0[0] MCLK[0] HIGH H2L 2.3 SETUP
scale_check BANK0_F2_WRDAT_P0[0] MCLK[0] HIGH H2L 1.025 SETUP
scale_check BANK0_F2_WRDAT_P0[0] MCLK[0] HIGH H2L 2.3 SETUP
scale_check BANK0_F2_WRDAT_P0[0] MCLK[0] HIGH H2L 0.85125 HOLD
scale_check BANK0_F2_WRDAT_P0[0] MCLK[0] LOW H2L 0.850877 HOLD

我想打印第 6 个字段中值最高的行,而所有其他字段都是唯一的。

期望的输出:

scale_check BANK0_F2_WRDAT_P0[0] MCLK[0] LOW H2L 0.867568 SETUP
scale_check BANK0_F2_WRDAT_P0[0] MCLK[0] HIGH H2L 2.3 SETUP
scale_check BANK0_F2_WRDAT_P0[0] MCLK[0] HIGH H2L 0.85125 HOLD
scale_check BANK0_F2_WRDAT_P0[0] MCLK[0] LOW H2L 0.850877 HOLD

在 awk 中有一个聪明的方法吗?

【问题讨论】:

    标签: awk


    【解决方案1】:

    明智的做法是使用 sort+awk:

    $ sort -k6,6nr file | awk '!seen[$1,$2,$3,$4,$5,$7]++'
    scale_check BANK0_F2_WRDAT_P0[0] MCLK[0] HIGH H2L 2.3 SETUP
    scale_check BANK0_F2_WRDAT_P0[0] MCLK[0] LOW H2L 0.867568 SETUP
    scale_check BANK0_F2_WRDAT_P0[0] MCLK[0] HIGH H2L 0.85125 HOLD
    scale_check BANK0_F2_WRDAT_P0[0] MCLK[0] LOW H2L 0.850877 HOLD
    

    但如果你只想使用 awk,你可以这样做:

    $ awk '
        { orig=$0; $6=""; key=$0; $0=orig }
        NR==FNR{ if ( !(key in max) || $6 > max[key] ) { max[key]=$6; nr[key]=NR } next }
        nr[key]==FNR
    ' file file
    scale_check BANK0_F2_WRDAT_P0[0] MCLK[0] LOW H2L 0.867568 SETUP
    scale_check BANK0_F2_WRDAT_P0[0] MCLK[0] HIGH H2L 2.3 SETUP
    scale_check BANK0_F2_WRDAT_P0[0] MCLK[0] HIGH H2L 0.85125 HOLD
    scale_check BANK0_F2_WRDAT_P0[0] MCLK[0] LOW H2L 0.850877 HOLD
    

    【讨论】:

    【解决方案2】:

    如果您不希望字段在所需的输出中按顺序排列,

    awk '{if(uniqueSet[$1" "$2" "$3" "$4" "$5" "$7] < $6) { uniqueSet[$1" "$2" "$3" "$4" "$5" "$7] = $6} }END{for(i in uniqueSet){print i" "uniqueSet[i]} }' <input_file_name>
    

    愿意,

    scale_check BANK0_F2_WRDAT_P0[0] MCLK[0] LOW H2L SETUP 0.867568
    scale_check BANK0_F2_WRDAT_P0[0] MCLK[0] LOW H2L HOLD 0.850877
    scale_check BANK0_F2_WRDAT_P0[0] MCLK[0] HIGH H2L HOLD 0.85125
    scale_check BANK0_F2_WRDAT_P0[0] MCLK[0] HIGH H2L SETUP 2.3
    

    如果你想保持字段的顺序,

    awk '{if(uniqueSet[$1" "$2" "$3" "$4" "$5" "$7] < $6) { uniqueSet[$1" "$2" "$3" "$4" "$5" "$7] = $6} }END{for(i in uniqueSet){ split(i, ar, " "); print ar[1]" "ar[2]" "ar[3]" "ar[4]" "ar[5]" "uniqueSet[i]" "ar[6]} }' <input_file_name>
    

    愿意,

    scale_check BANK0_F2_WRDAT_P0[0] MCLK[0] LOW H2L 0.867568 SETUP
    scale_check BANK0_F2_WRDAT_P0[0] MCLK[0] LOW H2L 0.850877 HOLD
    scale_check BANK0_F2_WRDAT_P0[0] MCLK[0] HIGH H2L 0.85125 HOLD
    scale_check BANK0_F2_WRDAT_P0[0] MCLK[0] HIGH H2L 2.3 SETUP
    

    【讨论】:

    • awesome :) 你介意解释一下吗?
    • 迭代每一行,脚本会将行的内容放入一个映射中,键是需要分组的行的字段(在这种情况下,除了字段 no 之外的所有字段6)。该键的值是属于该组的行中第 6 个字段的最大值。迭代完成后,它只打印地图的[ key value]
    • 请注意,这只有在被比较的值都是正数时才有效
    • 我相信这是基于关联数组概念的。还有任何线索如何在负数上也可以做到这一点?
    • 您可以通过添加空uniqueSet 的测试使其负值友好,不是吗?
    【解决方案3】:

    在 GNU awk 中:

    $ gawk ' {
        t=$6                                 # put $6 to temp
        $6="MARK"                            # replace it with a marker, use $0 as key
        if($0 in v==0 || t>v[$0]) {          # if $0 not in value hash or t>previous value
            a[$0]=NR                         # in a goes the record number for ordering
            v[$0]=t
        }
    }
    END {                                    # in the end
        PROCINFO["sorted_in"]="@val_num_asc" # traverse a in growing order of NRs stored
        for(i in a) {
            sub(/MARK/,v[i],i)               # replace mark with value
            print i                          # and output
        }
    }' file
    scale_check BANK0_F2_WRDAT_P0[0] MCLK[0] LOW H2L 0.867568 SETUP
    scale_check BANK0_F2_WRDAT_P0[0] MCLK[0] HIGH H2L 2.3 SETUP
    scale_check BANK0_F2_WRDAT_P0[0] MCLK[0] HIGH H2L 0.85125 HOLD
    scale_check BANK0_F2_WRDAT_P0[0] MCLK[0] LOW H2L 0.850877 HOLD
    

    【讨论】:

      【解决方案4】:

      使用 GNU datamash + cut 工具的简短替代方案:

      datamash -Wf -g1,2,3,4,5,7 max 6 <file | cut -f1-7 --output-delimiter=' '
      

      输出:

      scale_check BANK0_F2_WRDAT_P0[0] MCLK[0] LOW H2L 0.867568 SETUP
      scale_check BANK0_F2_WRDAT_P0[0] MCLK[0] HIGH H2L 2.3 SETUP
      scale_check BANK0_F2_WRDAT_P0[0] MCLK[0] HIGH H2L 0.85125 HOLD
      scale_check BANK0_F2_WRDAT_P0[0] MCLK[0] LOW H2L 0.850877 HOLD
      

      【讨论】:

        猜你喜欢
        • 2019-04-10
        • 1970-01-01
        • 1970-01-01
        • 2021-12-26
        • 1970-01-01
        • 1970-01-01
        • 2011-05-03
        • 2021-12-14
        • 1970-01-01
        相关资源
        最近更新 更多