【问题标题】:Compare two files content比较两个文件内容
【发布时间】:2014-10-24 16:14:02
【问题描述】:

我有两个文件 test1.txt 和 test2.txt

test1.txt 包含

abc.cde.ccd.eed.12345.5678.txt
abcd.cdde.ccdd.eaed.12346.5688.txt
aabc.cade.cacd.eaed.13345.5078.txt
abzc.cdae.ccda.eaed.29345.1678.txt
abac.cdae.cacd.eead.18145.2678.txt
aabc.cdve.cncd.ened.19945.2345.txt

并且 test2.txt 包含

12345.5678.txt
29345.1678.txt
18145.2678.txt
10111.2222.txt

我想比较这两个文件并在 bash 中给我这样的输出

在这两个方面:

abc.cde.ccd.eed.12345.5678.txt
abzc.cdae.ccda.eaed.29345.1678.txt
abac.cdae.cacd.eead.18145.2678.txt

仅在 test1.txt 中

abcd.cdde.ccdd.eaed.12346.5688.txt
aabc.cade.cacd.eaed.13345.5078.txt
aabc.cdve.cncd.ened.19945.2345.txt

仅在 test2.txt 中

10111.2222.txt

【问题讨论】:

  • comm 不会比较,因为 test2.txt 只包含 test1.txt 的部分内容。
  • No diff 也不能比较,因为 test2 只包含 test1 的一部分
  • @user3845185 经过一些预处理后可以使用 comm

标签: bash awk grep compare


【解决方案1】:

在这两个方面:

grep -f text2.txt text1.txt

输出:

abc.cde.ccd.eed.12345.5678.txt
abzc.cdae.ccda.eaed.29345.1678.txt
abac.cdae.cacd.eead.18145.2678.txt


仅在 test1.txt 中:
grep -v -f text2.txt text1.txt

输出:

abcd.cdde.ccdd.eaed.12346.5688.txt
aabc.cade.cacd.eaed.13345.5078.txt
aabc.cdve.cncd.ened.19945.2345.txt


仅在 test2.txt 中:
grep -v -f <( grep -Eo '[0-9]+.[0-9]+.txt' text1.txt) text2.txt

输出:

10111.2222.txt

【讨论】:

    【解决方案2】:
    File1 :
    abc.cde.ccd.eed.12345.5678.txt
    abcd.cdde.ccdd.eaed.12346.5688.txt
    aabc.cade.cacd.eaed.13345.5078.txt
    abzc.cdae.ccda.eaed.29345.1678.txt
    abac.cdae.cacd.eead.18145.2678.txt
    aabc.cdve.cncd.ened.19945.2345.txt
    
    
    File2 :
    12345.5678.txt
    29345.1678.txt
    18145.2678.txt
    10111.2222.txt
    
    
    
    #!/bin/bash
    
    if [ -e Both.txt ]
    then
      rm Both.txt
    fi
    
    if [ -e File1.txt ]
    then
      rm File1.txt
    fi
    
    if [ -e File2.txt ]
    then
      rm File2.txt
    fi
    
    while read f2line
    do
      found=0
      while read f1line
      do
        Both=`echo "$f1line" | grep "$f2line"`
        if [ $? -eq 0 ]
        then
          found=1
          echo $Both >> Both.txt
        fi
      done < File1
    if [ $found -eq 0 ]
    then
      echo $f2line >> File2.txt
    fi
    done < File2
    
    sort Both.txt > s_Both.txt
    sort File1 > s_File1
    comm -3 s_File1 s_Both.txt > File1.txt
    rm s_File1
    rm s_Both.txt
    

    输出文件:Both.txt、File1.txt、File2.txt

    【讨论】:

      【解决方案3】:

      这个公式可以使用来自 GNU Coreutils 的comm 来解决:

      首先对第二个文件进行排序:

      sort -o test2.txt test2.txt;
      

      然后使用命令显示行:

      # unique to test1.txt
      cut -d '.' -f 1-4 --complement test1.txt | sort | comm -23 - test2.txt
      # unique to test2.txt
      cut -d '.' -f 1-4 --complement test1.txt | sort | comm -13 - test2.txt
      # that appear in both files
      cut -d '.' -f 1-4 --complement test1.txt | sort | comm -12 - test2.txt
      

      解释

      # 1. Extract all but first four fields from test1.txt
      cut -d '.' -f 1-4 --complement test1.txt
      # 2. Here '-' replaces standard input
      comm -3 - test2.txt
      

      【讨论】:

        【解决方案4】:

        以下 AWK 脚本 script.awk 也可以完成这项工作:

        NR == FNR { lines[++i] = $0 }
        
        NR > FNR { patterns[++j] = $0 }
        
        END {
            for (p_index in patterns)
                for (l_index in lines)
                    if (index(lines[l_index], patterns[p_index]) > 0) {
                        lines_match[l_index] = 1
                        patterns_match[p_index] = 1
                    }
        
            print "Lines only in first file:"
            for (l_index in lines)
                if (!(l_index in lines_match)) 
                    print lines[l_index]
        
            print "Lines only in second file:"
            for (p_index in patterns)
                if (! (p_index in patterns_match)) 
                    print patterns[p_index]
        
            print "Lines in both files:"
            for (l_index in lines)
                if (l_index in lines_match)
                    print lines[l_index]
        }
        

        可以这样调用:

        awk -f script.awk test1.txt test2.txt
        

        请注意,脚本不会对两个文件中的数据结构做出任何假设。它只是假设test2.txt 中的行是test1.txt 中的行的潜在子字符串。

        【讨论】:

          猜你喜欢
          • 2021-06-30
          • 2018-03-14
          • 2014-07-11
          • 1970-01-01
          • 1970-01-01
          • 1970-01-01
          • 1970-01-01
          • 1970-01-01
          • 1970-01-01
          相关资源
          最近更新 更多