【问题标题】:Print only the lines which are existing in all the input files仅打印所有输入文件中存在的行
【发布时间】:2018-06-11 21:32:11
【问题描述】:

只打印所有四个给定输入文件中存在的行。从下面显示的输入文件中,只有 /dev/dev_sg2 和 /dev/dev_sg3 存在于所有输入文件中

$ cat file1
/dev/dev_sg1
/dev/dev_sg2
/dev/dev_sg3
/dev/dev_sg4

$ cat file2
/dev/dev_sg8
/dev/dev_sg2
/dev/dev_sg3
/dev/dev_sg6

$ cat file3
/dev/dev_sg5
/dev/dev_sg2
/dev/dev_sg3
/dev/dev_sg6

$ cat file4
/dev/dev_sg2
/dev/dev_sg3
/dev/dev_sg1
/dev/dev_sg4

尝试过的工具:-

cat file* | sort |uniq -c

      1 /dev/dev_sg1
      4 /dev/dev_sg2
      4 /dev/dev_sg3
      1 /dev/dev_sg4
      1 /dev/dev_sg5
      2 /dev/dev_sg6
      1 /dev/dev_sg8

【问题讨论】:

标签: sorting awk sed grep uniq


【解决方案1】:

使用 comm 管道:

comm -12 <(sort file1) <(sort file2) | comm -12 - <(sort file3) | comm -12 - <(sort file4)
  • -12 - 抑制两个输入文件独有的行,只打印公共行

输出:

/dev/dev_sg2
/dev/dev_sg3

【讨论】:

    【解决方案2】:

    遵循awk 代码可能对您有所帮助。

    awk 'FNR==NR{a[$0];next} ($0 in a){++c[$0]} END{for(i in c){if(c[i]==3){print i,c[i]+1}}}' Input_file1 Input_file2 Input_file3 Input_file4
    

    输出如下。

    /dev/dev_sg2 4
    /dev/dev_sg3 4
    

    编辑: 如果您不想计算行数,而只想打印所有 4 个 Input_files 中的行,那么下面就可以了诀窍:

    awk 'FNR==NR{a[$0];next} ($0 in a){++c[$0]} END{for(i in c){if(c[i]==3){print i}}}'  Input_file1 Input_file2 Input_file3 Input_file4
    

    EDIT2:现在也为代码添加解释。

    awk '
    FNR==NR{ ##FNR==NR condition will be TRUE when very first Input_file here Input_file1 is being read.
     a[$0];  ##creating an array named a whose index is current line $0.
     next    ##next is awk out of the box keyword which will avoid the cursor to go forward and will skip all next statements.
    }
    ($0 in a){ ##These statements will be executed when awk complete reading the first Input_file named Input_file1 name here. Checking here is $0 is in array a.
     ++c[$0]   ##If above condition is TRUE then make an increment in array named c value whose index is current line.
    }
    END{       ##Starting END block of awk code here.
    for(i in c){##Initiating a for loop here by which we will iterate in array c.
     if(c[i]==3){ ##checking condition here if array c value is equal to 3, which means it appeared in all 4 Input_file(s).
       print i    ##if, yes then printing the value of i which is actually having the line which is appearing in all 4 Input_file(s).
    }
    }}
    ' Input_file1 Input_file2 Input_file3 Input_file4 ##Mentioning all the 4 Input_file(s) here.
    

    【讨论】:

      【解决方案3】:

      如果您事先知道输入文件不会超过 4 个,您可以简单地在现有解决方案的末尾添加 grep,如下所示:

      cat file* | sort |uniq -c | egrep '^4'
      

      这将仅显示在行首具有最大 (4) 个计数的行。

      如果您需要它来处理任意数量的文件,则需要更好的解决方案。

      【讨论】:

        【解决方案4】:

        如果订单不需要维护

        $ j() { join <(sort $1) <(sort $2); }; j <(j file1 file2) <(j file3 file4)
        
        /dev/dev_sg2
        /dev/dev_sg3
        

        【讨论】:

          猜你喜欢
          • 1970-01-01
          • 1970-01-01
          • 1970-01-01
          • 1970-01-01
          • 1970-01-01
          • 1970-01-01
          • 2015-10-14
          • 2022-11-29
          • 1970-01-01
          相关资源
          最近更新 更多