【问题标题】:AWK: Compare two CSV filesAWK:比较两个 CSV 文件
【发布时间】:2012-09-05 15:23:45
【问题描述】:

我有两个 CSV 文件,我想使用 AWK 比较它们并生成一个新文件。

file1.csv:

"no","loc" 
"abc121","C:/pro/in" 
"abc122","C:/pro/abc"
"abc123","C:/pro/xyz"
"abc124","C:/pro/in" 

file2.csv:

"no","loc" 
"abc121","C:/pro/in"
"abc122","C:/pro/abc"
"abc125","C:/pro/xyz"
"abc126","C:/pro/in" 

输出.csv:

"file1","file2","Diff" 
"abc121","abc121","Match" 
"abc122","abc122","Match" 
"abc123","","Unmatch" 
"abc124","","Unmatch" 
"","abc125","Unmatch" 
"","abc126","Unmatch"

【问题讨论】:

  • 一个例子不是问题的描述。简单地尝试详细描述问题通常会直接导致明显的解决方案。
  • 我不会使用awk,而是讨论一下diff 命令的选项,它允许这种逐行格式化。 (不过,只有 GNU diff?)

标签: shell scripting awk cygwin


【解决方案1】:

awk 的一种方式:

script.awk:

BEGIN {
    FS = ","
}

NR>1 && NR==FNR {
    a[$1] = $2
    next
}

FNR>1 { 
    print ($1 in a) ? $1 FS $1 FS "Match" : "\"\"" FS $1 FS "Unmatch"
    delete a[$1] 
}

END {
    for (x in a) {
        print x FS "\"\"" FS "Unmatch"
    }
}

输出:

$ awk -f script.awk file1.csv file2.csv
"abc121","abc121",Match
"abc122","abc122",Match
"","abc125",Unmatch
"","abc126",Unmatch
"abc124","",Unmatch
"abc123","",Unmatch

【讨论】:

    【解决方案2】:

    我没有单独使用awk,但如果我理解你所问的要点,我认为这个长单行应该可以做到......

    join -t, -a 1 -a 2 -o 1.1 2.1 1.2 2.2 file1.csv file2.csv | awk -F, '{ if ( $3 == $4 ) var = "\"Match\""; else var = "\"Unmatch\"" ; print $1","$2","var }' | sed -e '1d' -e 's/^,/"",/' -e 's/,$/,"" /' -e 's/,,/,"",/g'
    

    说明:

    • join 部分获取两个 CSV 文件,将它们连接到第一列(默认行为 join)并输出所有四个字段 (-o 1.1 2.1 1.2 2.2),确保包含两个文件都不匹配的行(-a 1 -a 2)。
    • awk 部分采用该输出并将第 3 列和第 4 列的组合替换为 "Match""Unmatch",具体取决于它们是否确实匹配。我不得不根据您的示例对这种行为做出假设。
    • sed 部分从输出 (-e '1d') 中删除“no”、“loc”标头并用开闭引号 (-e 's/^,/"",/' -e 's/,$/,""/' -e 's/,,/,"",/g') 替换空字段。最后一部分对您来说可能不是必需的。

    编辑: 正如tripleee 指出的那样,如果两个初始文件未排序,则上述操作将失败。这是一个更新的命令来解决这个问题。它在将每个文件传递给加入之前将标题行和排序...

    join -t, -a 1 -a 2 -o 1.1 2.1 1.2 2.2 <( sed 1d file1.csv | sort ) <( sed 1d file2.csv | sort ) | awk -F, '{ if ( $3 == $4 ) var = "\"Match\""; else var = "\"Unmatch\"" ; print $1","$2","var }' | sed -e 's/^,/"",/' -e 's/,$/,""/' -e 's/,,/,"",/g'
    

    【讨论】:

    • join 需要排序输入。至少,您需要修剪这些标题行。 (反正他们很讨厌。)
    猜你喜欢
    • 2014-10-12
    • 2018-03-04
    • 1970-01-01
    • 2022-01-04
    • 1970-01-01
    • 2017-01-31
    • 2017-07-25
    • 1970-01-01
    相关资源
    最近更新 更多