【问题标题】:sort lines of a file based on a column which has text and numbers根据包含文本和数字的列对文件的行进行排序
【发布时间】:2016-03-17 07:31:53
【问题描述】:

我有一个包含大量行的文件。这些行在末尾附加了文本。现在我希望这些行按升序排序。

例子:

I have {few_1} lines here like this and so on
I have {few_101} lines here like this and so on
I have {few_21} lines here like this and so on
I have {few_11} lines here like this and so on
I have {few_31} lines here like this and so on
I have {few_41} lines here like this and so on
I have {few_51} lines here like this and so on

我需要文件看起来像这样:

I have {few_1} lines here like this and so on
I have {few_11} lines here like this and so on
I have {few_21} lines here like this and so on
I have {few_31} lines here like this and so on
I have {few_41} lines here like this and so on
I have {few_51} lines here like this and so on
I have {few_101} lines here like this and so on

我已经尝试过了,但没有按预期工作。

sort -k7,7 -n filename

非常感谢任何帮助。

【问题讨论】:

  • 请注意,对于代码或数据,选择要格式化的块,然后使用编辑框左上角的{} 工具。祝你好运。
  • 为什么要更改数据?你还需要帮助吗?
  • 是的,请......我试过 -k3.2n.. 使用 htis 类型的数据
  • 使用 sort -k3.7n 。同样,我希望-k3.6n 能够工作,但是当它不起作用时,我增加了数字。祝你好运。

标签: sorting awk sed


【解决方案1】:

您可以告诉sort 在键号后面使用.n 符号跳过字段中的字符。

我预计 -k7.5n 将是正确的键,因为数字似乎从第 5 位开始。 May sort 也在计算默认字段分隔符的空间。

这还假设您的数据与您的样本一样有规律,并且字段 7 始终在 # 部分之前有一个长 4 个字符的单词。如果情况发生变化,那么您将不得不预处理您的文件。那将是 S.O. 上的单独 Q。

sort -k7.6n file

输出

I have few lines here like this1 and so on
I have few lines here like this11 and so on
I have few lines here like this21 and so on
I have few lines here like this31 and so on
I have few lines here like this41 and so on
I have few lines here like this51 and so on
I have few lines here like this101 and so on

IHTH

【讨论】:

    【解决方案2】:

    另一种方法:

    sort -nk2 -t_ file
    

    这会在下划线处拆分行,并对第二列进行数字排序。

    【讨论】:

    • 很高兴能帮上忙!请投票和/或接受回答您问题的回答帖子。
    • @rajaswisaka 你能保证在{few_1} 部分之前的文本中永远不会有_ 吗?如果是这样,这将起作用,如果不编辑您的示例输入/输出以包含该案例。
    【解决方案3】:

    无论每一行出现什么其他文本,都要稳健地执行此操作:

    1) 在字符串{<non-close-brace>_<number>} 中添加要隔离以用于排序的数字:

    $ sed -r 's/.*\{[^}]+_([0-9]+)\}.*/\1\t&/' file
    1       I have {few_1} lines here like this and so on
    101     I have {few_101} lines here like this and so on
    21      I have {few_21} lines here like this and so on
    11      I have {few_11} lines here like this and so on
    31      I have {few_31} lines here like this and so on
    41      I have {few_41} lines here like this and so on
    51      I have {few_51} lines here like this and so on
    

    2) 排序:

    $ sed -r 's/.*\{[^}]+_([0-9]+)\}.*/\1\t&/' file | sort -n
    1       I have {few_1} lines here like this and so on
    11      I have {few_11} lines here like this and so on
    21      I have {few_21} lines here like this and so on
    31      I have {few_31} lines here like this and so on
    41      I have {few_41} lines here like this and so on
    51      I have {few_51} lines here like this and so on
    101     I have {few_101} lines here like this and so on
    

    3) 删除您在第 1 步中添加的数字:

    $ sed -r 's/.*\{[^}]+_([0-9]+)\}.*/\1\t&/' file | sort -n | cut -f2-
    I have {few_1} lines here like this and so on
    I have {few_11} lines here like this and so on
    I have {few_21} lines here like this and so on
    I have {few_31} lines here like this and so on
    I have {few_41} lines here like this and so on
    I have {few_51} lines here like this and so on
    I have {few_101} lines here like this and so on
    

    这是解决各种排序问题的一种非常常用的方法。

    【讨论】:

      【解决方案4】:

      为什么这对你不起作用?对于排序子字段索引,您需要设置-b 选项以忽略前导空格。这将从该键开始排序,可能是您想要的。

      $ sort -k3.6bn file
      
      I have {few_1} lines here like this and so on
      I have {few_11} lines here like this and so on
      I have {few_21} lines here like this and so on
      I have {few_31} lines here like this and so on
      I have {few_41} lines here like this and so on
      I have {few_51} lines here like this and so on
      I have {few_101} lines here like this and so on
      

      【讨论】:

        猜你喜欢
        • 2019-07-02
        • 1970-01-01
        • 2012-02-17
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 2011-09-20
        • 1970-01-01
        相关资源
        最近更新 更多