【问题标题】:How to sort csv by specific column如何按特定列对csv进行排序
【发布时间】:2017-11-28 09:53:01
【问题描述】:

我尝试按第 4 列对包含温度的 csv 进行排序。

Sort -n -k4 temperature.csv

结果我得到了这个:

2017-06-24 11:20,23.57,19.0,16.7,0.087,3.615
2017-06-24 11:25,23.51,19.0,16.7,0.087,3.689
2017-06-24 12:45,22.03,19.0,17.1,0.096,4.152
2017-06-24 13:00,21.92,19.0,17.1,0.096,4.229
2017-06-24 14:00,22.22,19.0,17.4,0.197,4.639
2017-06-24 14:25,22.21,19.0,17.5,0.197,4.774
2017-06-24 15:10,22.30,19.0,17.1,0.134,5.472
2017-06-24 16:00,22.42,19.0,17.3,0.134,5.93
2017-06-24 17:45,22.07,21.0,17.0,0.144,6.472
2017-06-24 18:25,21.90,21.0,16.9,0.15,6.814
2017-06-24 19:40,23.01,21.0,16.9,0.318,8.503

如您所见,第 4 列未正确排序。我预计第一行是 17.5,最后一行是 16.7。

我也试过这个:

sort -n -t. -k4,1n temperature.csv

结果与前面的示例完全相同。 谁能给个提示?

【问题讨论】:

    标签: bash shell csv sorting


    【解决方案1】:

    使用以下sort 命令:

    sort -t, -k4,4 -nr temperature.csv
    

    输出:

    2017-06-24 14:25,22.21,19.0,17.5,0.197,4.774
    2017-06-24 14:00,22.22,19.0,17.4,0.197,4.639
    2017-06-24 16:00,22.42,19.0,17.3,0.134,5.93
    2017-06-24 15:10,22.30,19.0,17.1,0.134,5.472
    2017-06-24 13:00,21.92,19.0,17.1,0.096,4.229
    2017-06-24 12:45,22.03,19.0,17.1,0.096,4.152
    2017-06-24 17:45,22.07,21.0,17.0,0.144,6.472
    2017-06-24 19:40,23.01,21.0,16.9,0.318,8.503
    2017-06-24 18:25,21.90,21.0,16.9,0.15,6.814
    2017-06-24 11:25,23.51,19.0,16.7,0.087,3.689
    2017-06-24 11:20,23.57,19.0,16.7,0.087,3.615
    

    • -t, - 字段分隔符

    • -k4,4 - 仅按第 4 个字段排序

    • -nr - 按数字倒序排列

    【讨论】:

    • 一个正确、有用的答案 (+1),但您可以提到,如果数据文件包含 "quoted strings, that may contain commas (or even newlines)"sort 可能会失败。更通用的解决方案应该涉及数据的解析。
    • @gboffi,对于理想和复杂的输入,会有另一个复杂的解决方案。对于当前输入,简单的sort 将完成这项工作
    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2011-01-07
    • 2019-08-23
    • 1970-01-01
    • 2016-04-08
    相关资源
    最近更新 更多