【问题标题】:Perform uniq only on matching lines while also ignoring some columns仅在匹配行上执行 uniq,同时忽略某些列
【发布时间】:2016-06-04 03:28:17
【问题描述】:

假设我有一个如下所示的输入文件:

2016-06-03 21:00:14 > user1 has connected.
2016-06-03 21:00:14 > user1 has connected.
2016-06-03 21:00:15 > user1 has connected.
2016-06-03 21:00:22 > foobar disconnected.
2016-06-03 21:00:22 > foobar disconnected.
2016-06-03 21:00:29 > user2 has connected.
2016-06-03 21:00:29 > user2 has connected.
2016-06-03 21:00:29 > user2 has disconnected.
2016-06-03 21:00:30 > user2 has disconnected.
2016-06-03 21:00:30 > user2 has disconnected.

我可以删除所有重复的连续行,忽略带有uniq -f2 file.txt 的前两列,但我正在寻找一种方法来仅删除其中包含 has connected. 的重复项,因此输出将如下所示:

2016-06-03 21:00:14 > user1 has connected.
2016-06-03 21:00:22 > foobar disconnected.
2016-06-03 21:00:22 > foobar disconnected.
2016-06-03 21:00:29 > user2 has connected.
2016-06-03 21:00:29 > user2 has disconnected.
2016-06-03 21:00:30 > user2 has disconnected.
2016-06-03 21:00:30 > user2 has disconnected.

我想这可以通过匹配一个固定的字符串(“已连接。”)来完成,但我也对一个可以使用正则表达式的命令感兴趣。

我查看了this question 的答案,但无法修改命令,因此它们可以使用我的输入。

【问题讨论】:

    标签: regex perl awk uniq


    【解决方案1】:
    $ awk -F'>' '!(/has connected/ && seen[$2]++)' file
    2016-06-03 21:00:14 > user1 has connected.
    2016-06-03 21:00:22 > foobar disconnected.
    2016-06-03 21:00:22 > foobar disconnected.
    2016-06-03 21:00:29 > user2 has connected.
    2016-06-03 21:00:29 > user2 has disconnected.
    2016-06-03 21:00:30 > user2 has disconnected.
    2016-06-03 21:00:30 > user2 has disconnected.
    

    【讨论】:

      【解决方案2】:

      单行 Perl 解决方案

      perl -nE 'print unless /has connected/ && @s{/>\s+(.+)/}++' myfile.log
      

      输出

      2016-06-03 21:00:14 > user1 has connected.
      2016-06-03 21:00:22 > foobar disconnected.
      2016-06-03 21:00:22 > foobar disconnected.
      2016-06-03 21:00:29 > user2 has connected.
      2016-06-03 21:00:29 > user2 has disconnected.
      2016-06-03 21:00:30 > user2 has disconnected.
      2016-06-03 21:00:30 > user2 has disconnected.
      

      请注意,使用 散列切片 @s{/>\s+(.+)/}++ 是故意的。这通常是一个错误,但在这里它用于将正则表达式放在列表上下文中



      如果你想要像Chris Charley wrote 这样的可爱的东西,它只会在用户之前断开连接时报告已连接,那么在单线中这是不可能的。这个脚本会为你做到这一点

      如果您不熟悉 Perl,那么要在文件上运行它,您应该将 <DATA> 更改为 <> 并像这样运行程序

      $ perl filter.pl myfile.log
      
      use strict;
      use warnings;
      
      my %online;
      
      while ( <DATA> ) {
      
          next unless my ($name, $op) = />\s+(.+)\s+(disconnected|has connected)\./;
      
          if ( $op eq 'disconnected' ) {
              delete $online{$name};
              print;
          }
          else {
              print unless $online{$name}++;
          }
      }
      
      __DATA__
      2016-06-03 21:00:14 > user1 has connected.
      2016-06-03 21:00:14 > user1 has connected.
      2016-06-03 21:00:15 > user1 has connected.
      2016-06-03 21:00:22 > foobar disconnected.
      2016-06-03 21:00:22 > foobar disconnected.
      2016-06-03 21:00:15 > user1 disconnected.
      2016-06-03 21:00:29 > user2 has connected.
      2016-06-03 21:00:29 > user2 has connected.
      2016-06-03 21:00:29 > user2 has disconnected.
      2016-06-03 21:00:14 > user1 has connected.
      2016-06-03 21:00:30 > user2 has disconnected.
      2016-06-03 21:00:30 > user2 has disconnected.
      

      输出

      2016-06-03 21:00:14 > user1 has connected.
      2016-06-03 21:00:22 > foobar disconnected.
      2016-06-03 21:00:22 > foobar disconnected.
      2016-06-03 21:00:15 > user1 disconnected.
      2016-06-03 21:00:29 > user2 has connected.
      2016-06-03 21:00:29 > user2 has disconnected.
      2016-06-03 21:00:14 > user1 has connected.
      2016-06-03 21:00:30 > user2 has disconnected.
      2016-06-03 21:00:30 > user2 has disconnected.
      

      【讨论】:

        【解决方案3】:

        我认为这个 perl 解决方案可能是您想要的。我在数据中添加了更多行。

        #!/usr/bin/perl
        use strict;
        use warnings;
        
        my %seen;
        while (<DATA>) {
            if (/ > (.+? connected)/) {
                print unless $seen{$1}++;
            }
            else {
                %seen = ();
                print;  
            }   
        }
        
        __DATA__
        2016-06-03 21:00:14 > user1 has connected.
        2016-06-03 21:00:14 > user1 has connected.
        2016-06-03 21:00:15 > user1 has connected.
        2016-06-03 21:00:22 > foobar disconnected.
        2016-06-03 21:00:22 > foobar disconnected.
        2016-06-03 21:00:29 > user2 has connected.
        2016-06-03 21:00:29 > user2 has connected.
        2016-06-03 21:00:29 > user2 has disconnected.
        2016-06-03 21:00:30 > user2 has disconnected.
        2016-06-03 21:00:30 > user2 has disconnected.
        2016-06-03 21:00:31 > user1 has connected.
        2016-06-03 21:00:31 > user1 has connected.
        2016-06-03 21:00:34 > user1 has connected.
        2016-06-03 21:00:50 > user2 has connected.
        2016-06-03 21:00:51 > user2 has connected.
        

        打印出来

        2016-06-03 21:00:14 > user1 has connected.
        2016-06-03 21:00:22 > foobar disconnected.
        2016-06-03 21:00:22 > foobar disconnected.
        2016-06-03 21:00:29 > user2 has connected.
        2016-06-03 21:00:29 > user2 has disconnected.
        2016-06-03 21:00:30 > user2 has disconnected.
        2016-06-03 21:00:30 > user2 has disconnected.
        2016-06-03 21:00:31 > user1 has connected.
        2016-06-03 21:00:50 > user2 has connected.
        

        【讨论】:

          【解决方案4】:

          使用 awk:

          awk -F">" '!($2 in a) || $2 ~ /disconnected/ {a[$2]=$2; print}' < file.txt
          

          检查一个值是否已经存在于数组中,或者如果字符串中“断开”,则绕过该值

          !($2 in a) || $2 ~ /disconnected/ 
          

          输出

          2016-06-03 21:00:14 > user1 has connected.
          2016-06-03 21:00:22 > foobar disconnected.
          2016-06-03 21:00:22 > foobar disconnected.
          2016-06-03 21:00:29 > user2 has connected.
          2016-06-03 21:00:29 > user2 has disconnected.
          2016-06-03 21:00:30 > user2 has disconnected.
          2016-06-03 21:00:30 > user2 has disconnected.
          

          【讨论】:

            猜你喜欢
            • 1970-01-01
            • 1970-01-01
            • 1970-01-01
            • 2013-06-25
            • 1970-01-01
            • 2011-09-14
            • 1970-01-01
            • 2017-07-27
            • 2012-06-08
            相关资源
            最近更新 更多