Bash：按最后一个单词过滤文件答案

【问题标题】：Bash : Filter file by last wordBash：按最后一个单词过滤文件
【发布时间】：2018-10-15 19:12:29
【问题描述】：

我有一个如下所示的日志文件：

Sun Oct 14 03:38:28 2018 [pid 5922] command: Client "0.0.0.0", "USER macly"
Sun Oct 14 03:38:58 2018 [pid 5940] command: Client "0.0.0.0", "USER tredred"
Sun Oct 14 03:40:41 2018 [pid 6870] command: Client "0.0.0.0", "USER sweet"
Sun Oct 14 03:40:47 2018 [pid 7037] command: Client "0.0.0.0", "USER sweet"

我正在尝试编辑文件，使其保留第一次出现的“用户”并删除其余的。所以基本上上面的块看起来像：

Sun Oct 14 03:38:28 2018 [pid 5922] command: Client "0.0.0.0", "USER macly"
    Sun Oct 14 03:38:58 2018 [pid 5940] command: Client "0.0.0.0", "USER tredred"
    Sun Oct 14 03:40:41 2018 [pid 6870] command: Client "0.0.0.0", "USER sweet"

由于时间戳不同，这些行并不是真正的“唯一”。我可以使用 awk 然后执行 uniq 的想法： awk '{print $NF}' /home/user_logs | uniq

但这只是我每行的最后一个词，而不是整行。我需要在命令中添加什么以保留整行？

【问题讨论】：

标签： bash awk uniq

【解决方案1】：

你不需要uniq

$ awk -F, '!a[$NF]++' file

Sun Oct 14 03:38:28 2018 [pid 5922] command: Client "0.0.0.0", "USER macly"
Sun Oct 14 03:38:58 2018 [pid 5940] command: Client "0.0.0.0", "USER tredred"
Sun Oct 14 03:40:41 2018 [pid 6870] command: Client "0.0.0.0", "USER sweet"

说明

a[$NF]++ post 计算最后一个字段值的出现次数，显然第一个字段值为零，后续值非零。这个值的否定（!）（被视为逻辑，0~false；1~true）对于值的第一个实例仅是true。默认操作是{print $0}，所以没有明确写出来。

这是标准的awk 习惯用法，用于打印不需要对文件进行排序的唯一值。

【讨论】：

哇。嗯，你能解释一下命令中到底发生了什么

【解决方案2】：

̲I̲f̲ 数据是固定宽度的，可以使用uniq

$ uniq -s 63 file
Sun Oct 14 03:38:28 2018 [pid 5922] command: Client "0.0.0.0", "USER macly"
Sun Oct 14 03:38:58 2018 [pid 5940] command: Client "0.0.0.0", "USER tredred"
Sun Oct 14 03:40:41 2018 [pid 6870] command: Client "0.0.0.0", "USER sweet"
└──────────────────────────────63─────────────────────────────┘

【讨论】：

你需要先排序！
@karakfa True .
... 所以它会是这样的：sort -k1.64 file | uniq -s 63