【问题标题】:awk to print Consecutive Sequential Numbers -Start Sequence and End Sequence:awk 打印连续的序列号 - 开始序列和结束序列:
【发布时间】:2014-07-14 21:35:08
【问题描述】:

想打印连续的序列号 - 从第一个字段开始序列和结束序列,以及 $2,substr($3,1,9),substr($4,4,6),$6,$8,$10 字段的组合. 输入文件未按第一列排序。

输入.txt

11,abc,22-JUN-12.08:06:03,22-JUN-12.08:06:03,19-Apr-16,1,INR,RO0412,RC03,L7,,31
12,abc,22-JUN-12.08:06:03,22-JUN-12.08:06:03,19-Apr-16,1,INR,RO0412,RC03,L7,,31
13,abc,22-JUN-12.08:06:03,22-JUN-12.08:06:03,19-Apr-16,1,INR,RO0412,RC03,L7,,31
14,abc,30-JUN-12.01:06:49,30-JUN-12.01:06:49,19-Apr-16,1,INR,RO0412,RC03,L7,,29
28,abc,30-JUN-12.01:06:49,30-JUN-12.01:06:49,19-Apr-16,1,INR,RO0412,RC03,L7,,29
32,def,29-MAY-13.12:05:11,29-MAY-13.12:05:11,15-Feb-17,1350,INR,RO0213,CD,K1,,30
33,def,29-MAY-13.12:05:11,29-MAY-13.12:05:11,15-Feb-17,1350,INR,RO0213,CD,K1,,30
41,abc,20-FEB-14.11:02:37,20-FEB-14.11:02:37,31-Dec-20,650,INR,EN1113,ch650,S317,,28
50,abc,20-FEB-14.11:02:37,20-FEB-14.11:02:37,31-Dec-20,650,INR,EN1113,ch650,S317,,28
51,abc,20-FEB-14.11:02:37,20-FEB-14.11:02:37,31-Dec-20,650,INR,EN1113,ch650,S317,,28
52,abc,20-FEB-14.11:02:37,20-FEB-14.11:02:37,31-Dec-20,650,INR,EN1113,ch650,S317,,28

仅对第一个字段尝试了此命令并获得了部分输出:

cat Input.txt | sort -k1 -t,| awk -F, 'NR==1 {a=$1;b=$1;next} ($1 != b+1){print a,"-",b; a=$1} {b=$1} END{print a,"-",b}'

11 - 14
28 - 28
32 - 33
41 - 41
50 - 52

期望的输出:

$2,$3,$4,$6,$8,$10,Start_No,End_No

abc,22-JUN-12,JUN-12,1,RO0412,L7,11,13
abc,30-JUN-12,JUN-12,1,RO0412,L7,14,14
abc,30-JUN-12,JUN-12,1,RO0412,L7,28,28
def,29-MAY-13,MAY-13,1350,RO0213,K1,32,33
abc,20-FEB-14,FEB-14,650,EN1113,S317,41,41
abc,20-FEB-14,FEB-14,650,EN1113,S317,50,52

编辑:更新 SampleInput.txt 没有按排序顺序,Ed Morton 你是对的,我的实际输入文件没有按排序顺序,想知道如何检查下面的示例。

13,abc,22-JUN-12.08:06:03,22-JUN-12.08:06:03,19-Apr-16,1,INR,RO0412,RC03,L7,,31
14,abc,30-JUN-12.01:06:49,30-JUN-12.01:06:49,19-Apr-16,1,INR,RO0412,RC03,L7,,29
11,abc,22-JUN-12.08:06:03,22-JUN-12.08:06:03,19-Apr-16,1,INR,RO0412,RC03,L7,,31
12,abc,22-JUN-12.08:06:03,22-JUN-12.08:06:03,19-Apr-16,1,INR,RO0412,RC03,L7,,31
28,abc,30-JUN-12.01:06:49,30-JUN-12.01:06:49,19-Apr-16,1,INR,RO0412,RC03,L7,,29
32,def,29-MAY-13.12:05:11,29-MAY-13.12:05:11,15-Feb-17,1350,INR,RO0213,CD,K1,,30
33,def,29-MAY-13.12:05:11,29-MAY-13.12:05:11,15-Feb-17,1350,INR,RO0213,CD,K1,,30
41,abc,20-FEB-14.11:02:37,20-FEB-14.11:02:37,31-Dec-20,650,INR,EN1113,ch650,S317,,28
50,abc,20-FEB-14.11:02:37,20-FEB-14.11:02:37,31-Dec-20,650,INR,EN1113,ch650,S317,,28
52,abc,20-FEB-14.11:02:37,20-FEB-14.11:02:37,31-Dec-20,650,INR,EN1113,ch650,S317,,28
51,abc,20-FEB-14.11:02:37,20-FEB-14.11:02:37,31-Dec-20,650,INR,EN1113,ch650,S317,,28

【问题讨论】:

    标签: awk


    【解决方案1】:

    这样的脚本可以做到。我希望有人不只是做它的精简版:

    #!/usr/bin/awk -f
    BEGIN {
        FS = OFS = ","
    }
    {
        # sub(/[.].*$/, "", $3)  ## Uncomment if you don't want to include the time.
        # sub(/[.].*$/, "", $4)  ## Uncomment if you don't want to include the time.
        key = $2 "," $3 "," $4 "," $6 "," $8 "," $10
        if (!(key in s)) {
            s[key] = e[key] = $1
            keys[k++] = key
        } else if ($1 < s[key]) {
            s[key] = $1
        } else if ($1 > e[key]) {
            e[key] = $1
        }
    }
    END {
        for (k = 0; k in keys; ++k) {
            key = keys[k]
            print key, s[key], e[key]
        }
    }
    

    也许类似:

    #!/usr/bin/awk -f
    BEGIN {
        FS = OFS = ","
    }
    {
        # sub(/[.].*$/, "", $3)
        # sub(/[.].*$/, "", $4)
        key = $2 "," $3 "," $4 "," $6 "," $8 "," $10
    }
    !s[key] {
        s[key] = e[key] = $1
        keys[k++] = key
        next
    }
    $1 < s[key] {
        s[key] = $1
        next  ## Optional.
    }
    $1 > e[key] {
        e[key] = $1
    }
    END {
        for (k = 0; k in keys; ++k) {
            key = keys[k]
            print key, s[key], e[key]
        }
    }
    

    awk -f script.awk file
    

    输出:

    abc,22-JUN-12.08:06:03,22-JUN-12.08:06:03,1,RO0412,L7,11,13
    abc,30-JUN-12.01:06:49,30-JUN-12.01:06:49,1,RO0412,L7,14,28
    def,29-MAY-13.12:05:11,29-MAY-13.12:05:11,1350,RO0213,K1,32,33
    abc,20-FEB-14.11:02:37,20-FEB-14.11:02:37,650,EN1113,S317,41,52
    

    不包括时间的输出(取消注释sub() 行):

    abc,22-JUN-12,22-JUN-12,1,RO0412,L7,11,13
    abc,30-JUN-12,30-JUN-12,1,RO0412,L7,14,28
    def,29-MAY-13,29-MAY-13,1350,RO0213,K1,32,33
    abc,20-FEB-14,20-FEB-14,650,EN1113,S317,41,52
    

    【讨论】:

    • @AVN 很遗憾,我没有很好地阅读您想要的内容,但不用担心,欢迎您:)
    【解决方案2】:
    $ cat tst.awk
    BEGIN{ FS=OFS="," }
    {
        seq = $1
        key = $2 FS substr($3,1,9) FS substr($4,4,6) FS $6 FS $8 FS $10
    
        if ( (seq != (prevSeq+1)) || (key != prevKey) ) {
            if (startSeq != "")
                print prevKey, startSeq, prevSeq
            startSeq = seq
        }
    
        prevSeq = seq
        prevKey = key
    }
    END {
        print key, startSeq, prevSeq
    }
    $
    $ awk -f tst.awk file
    abc,22-JUN-12,JUN-12,1,RO0412,L7,11,13
    abc,30-JUN-12,JUN-12,1,RO0412,L7,14,14
    abc,30-JUN-12,JUN-12,1,RO0412,L7,28,28
    def,29-MAY-13,MAY-13,1350,RO0213,K1,32,33
    abc,20-FEB-14,FEB-14,650,EN1113,S317,41,41
    abc,20-FEB-14,FEB-14,650,EN1113,S317,50,52
    

    【讨论】:

    • Ed Morton ,由于 Input.txt 没有排序,尝试了这个命令 sort -k1 -t, file.txt | awk -f tst.awk file.txt 但它不工作
    • 试过这样:awk -f tst.awk (sort -k1 -t, file.txt) 也不起作用,请建议
    • 输入不需要排序,只需按关键字段 ($2,substr($3,1,9),substr($4,4,6),$6,$8,$10) 分组,就像您发布的示例输入一样。只是声明it is not working 并没有为任何人提供任何信息来帮助您调试它,但最可能的问题是您的实际输入不遵循与您发布的示例输入相同的关键字段模式 - 如果是这样的话,编辑您的问题以显示一些更能代表您的真实输入的输入。
    • Ed Morton,你是对的,现在请参考 Edit Sample file without sort order,实际上我之前已经粘贴了排序的输入以轻松解决问题,请指教!!!
    【解决方案3】:

    我相信这会生成您想要的输出(与您显示的相同)。

    sort -k1 -t, Input.txt |
    awk '
      function prn() {print f2,d1,substr(f4,1,6),f6,f8,f10,n1,n2}
      function sav() {n1=$1;d1=d;f2=$2;f4=$4;f6=$6;f8=$8;f10=$10}
      BEGIN {FS=OFS=","}
      {d = substr($3,1,9)}
      NR == 1 {sav(); n2=n1; d2=d1; next}
      $1 != n2 + 1 || d1 != d {prn(); sav()}
      {n2=$1; d2=d}
      END {prn()}
    '
    

    我假设您实际上想要字段 4(日和月)的 第一个 6 个字符,而不是最后 6 个(月和年)。

    【讨论】:

      猜你喜欢
      • 2019-10-31
      • 1970-01-01
      • 2019-02-13
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多