【问题标题】:Print lines after uniq with column condition在具有列条件的 uniq 之后打印行
【发布时间】:2014-02-27 09:04:44
【问题描述】:

我有一个文件,文件中有以下内容

192.168.168.23 pg.something
181.135.56.13 pg.nothing
15.123.96.12 l.everything
15.151.15.3 f.something
15.151.15.3 pg.something
64.196.12.34 pg.nothing
15.123.96.12 l.everything
181.168.56.13 pg.nothing
192.168.168.23 pg.something
192.168.168.23 l.everything
192.12.56.152 l.everything
181.135.56.13 pg.nothing
64.196.12.34 pg.nothing
64.196.12.34 pg.something
181.135.56.13 pg.nothing
64.196.12.34 l.everything

我试图找出每个 IP 对按 IP 排序的每个用户的命中数。

我试过了。

for i in `cat test_file |awk '{print $1}'|sort |uniq -c |sort -rn |awk '{print $2}'`; do grep $i test_file;done |uniq -c |awk '{print $2,$3,$1}'

得到了

64.196.12.34 pg.nothing 2
64.196.12.34 pg.something 1
64.196.12.34 l.everything 1
192.168.168.23 pg.something 2
192.168.168.23 l.everything 1
181.135.56.13 pg.nothing 3
15.151.15.3 f.something 1
15.151.15.3 pg.something 1
15.123.96.12 l.everything 2
192.12.56.152 l.everything 1
181.168.56.13 pg.nothing 1

这个输出很好。但我想知道是否有办法修改这个输出看起来像这样......

64.196.12.34 pg.nothing 2
             pg.something 1
             l.everything 1
192.168.168.23 pg.something 2
               l.everything 1
181.135.56.13 pg.nothing 3
15.151.15.3 f.something 1
            pg.something 1
15.123.96.12 l.everything 2
192.12.56.152 l.everything 1
181.168.56.13 pg.nothing 1

即只删除重复的 IP...

提前致谢。

【问题讨论】:

    标签: linux bash sorting awk uniq


    【解决方案1】:

    你可以修改你最后的 awk 命令:

    awk '{if ($2!=a) {print $2"\t"$3"\t"$1} else {print "\t\t"$3"\t"$1}}{a=$2}'
    

    这给出了:

    64.196.12.34    pg.nothing      2
                    pg.something    1
                    l.everything    1
    192.168.168.23  pg.something    2
                    l.everything    1
    181.135.56.13   pg.nothing      3
    15.151.15.3     f.something     1
                    pg.something    1
    15.123.96.12    l.everything    2
    192.12.56.152   l.everything    1
    181.168.56.13   pg.nothing      1
    

    【讨论】:

    • 谢谢。有用。正是我需要的。稍微调整了一下。 awk '{if ($2!=a) {print $2"\t"$3"\t"$1} else {print "\t\t"$3"\t"$1}}{a=$2}'。跨度>
    • 没有铅!只接受答案! (感谢您的“=”,我将使用您的版本进行编辑)
    【解决方案2】:

    这是从头开始计算的:

    awk '
         {a[$1,$2]++; b[$1]; c[$2]}
         END{for (i in b) {for (j in c) if (a[i,j]) print i,j,a[i,j]}}
        ' file | awk '
                      $1==prev {print FS $2 FS $3; next} {prev=$1; print}
                     '
    

    第一部分计算:

    $ awk '{a[$1,$2]++; b[$1]; c[$2]} END{for (i in b) {for (j in c) if (a[i,j]) print i,j,a[i,j]}}' a 
    192.168.168.23 pg.something 2
    192.168.168.23 l.everything 1
    192.12.56.152 l.everything 1
    64.196.12.34 pg.nothing 2
    64.196.12.34 pg.something 1
    64.196.12.34 l.everything 1
    15.151.15.3 f.something 1
    15.151.15.3 pg.something 1
    15.123.96.12 l.everything 2
    181.135.56.13 pg.nothing 3
    181.168.56.13 pg.nothing 1
    

    说明

    • {a[$1,$2]++; b[$1]; c[$2]} 跟踪所有行组合:a 存储第 1 + 2 字段,b 第 1 和 c 第 2。
    • END{for (i in b) {for (j in c) if (a[i,j]) print i,j,a[i,j]}} 不断循环通过第一个和第二个字段,只打印匹配的那些。

    然后它会进行分组:

    $ awk '{a[$1,$2]++; b[$1]; c[$2]} END{for (i in b) {for (j in c) if (a[i,j]) print i,j,a[i,j]}}' a | awk '$1==prev {print FS $2 FS $3; next} {prev=$1; print}'
    192.168.168.23 pg.something 2
     l.everything 1
    192.12.56.152 l.everything 1
    64.196.12.34 pg.nothing 2
     pg.something 1
     l.everything 1
    15.151.15.3 f.something 1
     pg.something 1
    15.123.96.12 l.everything 2
    181.135.56.13 pg.nothing 3
    181.168.56.13 pg.nothing 1
    

    说明

    • '$1==prev {print FS $2 FS $3; next} 如果前一行具有相同的第一个字段,请仅从第二个字段打印。
    • {prev=$1; print}' 否则正常打印。

    【讨论】:

      【解决方案3】:

      这是一个 Perl 版本的解决方案:

      #!/usr/bin/perl
      
      use warnings;
      use strict;
      
      my %data;
      
      while (<DATA>) {
          chomp;
          my ($ip, $dom) = split;
          $data{$ip}->{$dom}++;
      }
      
      while(my ($ip, $doms) = each %data) {
          print "$ip\t";
          my ($dom, $cnt) = each %$doms;
          print "$dom $cnt\n";
          while (($dom, $cnt) = each %$doms) {
              print "\t\t$dom $cnt\n";
          }
          print "\n";
      }
      
      __DATA__
      192.168.168.23 pg.something
      181.135.56.13 pg.nothing
      15.123.96.12 l.everything
      15.151.15.3 f.something
      15.151.15.3 pg.something
      64.196.12.34 pg.nothing
      15.123.96.12 l.everything
      181.168.56.13 pg.nothing
      192.168.168.23 pg.something
      192.168.168.23 l.everything
      192.12.56.152 l.everything
      181.135.56.13 pg.nothing
      64.196.12.34 pg.nothing
      64.196.12.34 pg.something
      181.135.56.13 pg.nothing
      64.196.12.34 l.everything
      

      及其结果:

      192.12.56.152   l.everything 1
      
      15.151.15.3     pg.something 1
                      f.something 1
      
      64.196.12.34    pg.nothing 2
                      pg.something 1
                      l.everything 1
      
      181.168.56.13   pg.nothing 1
      
      15.123.96.12    l.everything 2
      
      192.168.168.23  pg.something 2
                      l.everything 1
      
      181.135.56.13   pg.nothing 3
      

      结果并没有很好地对齐,但应该很容易调整它以提供与问题中完全相同的对齐方式。

      这是改编版:

      while(my ($ip, $doms) = each %data) {
          print "$ip ";
          my ($dom, $cnt) = each %$doms;
          print "$dom $cnt\n";
          my $prefix = ' ' x (length $ip);
          while (($dom, $cnt) = each %$doms) {
              print "$prefix $dom $cnt\n";
          }
      }
      

      【讨论】:

        猜你喜欢
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 2011-05-01
        相关资源
        最近更新 更多