在 awk 中对块内的行进行排序答案

【问题标题】：Sorting Lines Within Blocks in Awk在 awk 中对块内的行进行排序
【发布时间】：2018-10-29 04:03:39
【问题描述】：

我有一个很长的文件，其中包含依赖项列表、它们的版本以及依赖项所属的服务。此文件按块排序和分隔。

这是我指的文件文件中的一个 sn-p：

foo.bar.baz:json:jar:2.2.2:compile service: ServiceTwo
foo.bar.baz:json:jar:2.2.10:compile service: ServiceThree
@
asm:asm:jar:3.3.1:compile service: ServiceOne
asm:asm:jar:3.3.1:compile service: ServiceTwo
asm:asm:jar:3.3.0:compile service: ServiceThree
@
hi.bye:beatles:jar:1.6:compile service: ServiceOne
hi.bye:beatles:jar:1.5:compile service: ServiceTwo
hi.bye:beatles:jar:1.15:compile service: ServiceThree
@

如果您注意到：版本号有点在每个依赖块中从最高到最低排序。我正在尝试编写一个awk 脚本，它将各自块中的每一行从最高版本号排序到最低版本号。输出应该是这样的：

foo.bar.baz:json:jar:2.2.10:compile service: ServiceThree
foo.bar.baz:json:jar:2.2.2:compile service: ServiceTwo
@
asm:asm:jar:3.3.1:compile service: ServiceOne
asm:asm:jar:3.3.1:compile service: ServiceTwo
asm:asm:jar:3.3.0:compile service: ServiceThree
@
hi.bye:beatles:jar:1.15:compile service: ServiceThree
hi.bye:beatles:jar:1.6:compile service: ServiceOne
hi.bye:beatles:jar:1.5:compile service: ServiceTwo
@

注意：输出中的服务名称不需要按任何特定顺序排列。只要版本从大到小排序。

从逻辑上讲，我认为我应该设置RS="@" 并创建一个包含该块中每一行的数组，然后按版本号对这些数组进行排序并打印它们。问题是，我不知道如何按版本号对它们进行排序。这是我目前在awk 脚本中的内容：

BEGIN {
    RS = "@";
}
{
    split($0, lines, "\n");

    # sort the array by the version number from highest to lowest
    # <--- I need help here

    for(key in lines) { print lines[key]; }
    delete lines;
}
END {
}

如果这完全不符合标准，我愿意尝试新方法。任何有关此问题的帮助将不胜感激！

【问题讨论】：

标签： arrays sorting awk

【解决方案1】：

使用 GNU awk：

$ awk '
BEGIN {
    FS=":"
    PROCINFO["sorted_in"]="@ind_num_desc"  # for array processes order
}
$0=="@" {                                  # at the end of a block
    for(i in a)                            # order every array dimension
        for(j in a[i])
            for(k in a[i][j])
                for(l in a[i][j][k])
                    print a[i][j][k][l]    # output
     print "@"                             # block separator
     delete a                              # delete array 
     next                                  # skip to next block
}
{
     split($4,b,".")                       # separate version depths
     a[b[1]][b[2]][b[3]][--c]=$0           # hash to a
}' file
foo.bar.baz:json:jar:2.2.10:compile service: ServiceThree
foo.bar.baz:json:jar:2.2.2:compile service: ServiceTwo
@
asm:asm:jar:3.3.1:compile service: ServiceOne
asm:asm:jar:3.3.1:compile service: ServiceTwo
asm:asm:jar:3.3.0:compile service: ServiceThree
@
hi.bye:beatles:jar:1.15:compile service: ServiceThree
hi.bye:beatles:jar:1.6:compile service: ServiceOne
hi.bye:beatles:jar:1.5:compile service: ServiceTwo
@

本应在公园里快速而美丽的散步却变成了令人讨厌的黑客行为。

【讨论】：

从所需的输出看来 OP 想要按版本降序排列。
巡回演出
我特别讨厌里面的--c。哈克德拉哈克。
当我读到 It's going to need more code. 我立刻想到了 Jaws 的“你需要一艘更大的船” :) 不管你是不是黑客，我喜欢这个回答很多。
@jas，Jaws，不错的模拟。 :D

【解决方案2】：

使用 GNU 排序进行版本排序：

$ awk -F':' -v OFS='\t' 'NF==1{c++} {print c+1, $4, $0}' file  | sort -k1n -k2rV | cut -f3-
foo.bar.baz:json:jar:2.2.10:compile service: ServiceThree
foo.bar.baz:json:jar:2.2.2:compile service: ServiceTwo
@
asm:asm:jar:3.3.1:compile service: ServiceTwo
asm:asm:jar:3.3.1:compile service: ServiceOne
asm:asm:jar:3.3.0:compile service: ServiceThree
@
hi.bye:beatles:jar:1.15:compile service: ServiceThree
hi.bye:beatles:jar:1.6:compile service: ServiceOne
hi.bye:beatles:jar:1.5:compile service: ServiceTwo
@

【讨论】：

我将在原始问题中澄清服务名称不需要按任何特定顺序排列。感谢您如此迅速而简洁的回复！

【解决方案3】：

这是另一个awk

$ awk '/^@/{close(cmd); print; next} 
           {cmd="sort -rV"; print | cmd}' file

foo.bar.baz:json:jar:2.2.10:compile service: ServiceThree
foo.bar.baz:json:jar:2.2.2:compile service: ServiceTwo
@
asm:asm:jar:3.3.1:compile service: ServiceTwo
asm:asm:jar:3.3.1:compile service: ServiceOne
asm:asm:jar:3.3.0:compile service: ServiceThree
@
hi.bye:beatles:jar:1.15:compile service: ServiceThree
hi.bye:beatles:jar:1.6:compile service: ServiceOne
hi.bye:beatles:jar:1.5:compile service: ServiceTwo
@

【讨论】：

没有一个简单的方法可以根据服务名称描述进行额外的排序，这里是基于词法顺序的反向排序。我猜这些是占位符，将被真实的服务名称替换......
我应该澄清一下，服务名称不需要按顺序排列。我会把它放在问题中。你的回答非常好。谢谢！
我实际上更喜欢这个而不是我的答案，但我会将 cmd 的定义移到 BEGIN 部分，而不是为每一行输入执行它，我会更改它的定义以确保它只按版本号排序：awk 'BEGIN{cmd="sort -t: -k4rV"} /^@/{close(cmd); print; next} {print | cmd}'