【问题标题】:Show datailed differences for two csv files with bash or awk使用 bash 或 awk 显示两个 csv 文件的数据差异
【发布时间】:2018-12-19 18:07:37
【问题描述】:

对于我在 bash 中比较两个 cvs 文件的情况,我需要您的建议:

file1.csv

300000493|300000323|300000323|300000000|2|0|12619|0|0|+000000000000043.000|15|0|49300|1|42|4
300315830|300315830|300000419|300000000|2|0|12619|0|0|+000000000004020.000|18|0|31583000|89|43|4
300000493|300000323|300000323|300000000|10|0|12619|0|0|+000000000000210.000|14|0|49300|1|43|4
300000493|300000323|300000323|300000000|16|0|12619|0|0|+000000000000014.000|16|0|49300|89|42|4
300146897|300146897|300000394|300000000|609|1|12619|0|0|+000000000000020.000|1|0|14689700|7|36|4

file2.csv

300000493|300000323|300000323|300000000|2|0|12619|0|0|+000000000000053.000|1|0|49300|1|42|4
300315830|300315830|300000419|300000000|2|0|12619|0|0|+000000000004020.000|18|0|49300|89|43|4
300000493|300000323|300000323|300000000|10|0|12619|0|0|+000000000000219.000|14|0|49300|1|43|5

diff -y file1.csv file2.csv 命令显示了我正在寻找的类似输出:

300000493|300000323|300000323|300000000|2|0|12619|0|0|+000000000000043.000|15|0|49300|1|42|4       |    300000493|300000323|300000323|300000000|2|0|12619|0|0|+000000000000053.000|1|0|49300|1|42|4
300315830|300315830|300000419|300000000|2|0|12619|0|0|+000000000004020.000|18|0|31583000|89|43|4   |    300315830|300315830|300000419|300000000|2|0|12619|0|0|+000000000004020.000|18|0|49300|89|43|4
300000493|300000323|300000323|300000000|10|0|12619|0|0|+000000000000210.000|14|0|49300|1|43|4      |    300000493|300000323|300000323|300000000|10|0|12619|0|0|+000000000000219.000|14|0|49300|1|43|5
300000493|300000323|300000323|300000000|16|0|12619|0|0|+000000000000014.000|16|0|49300|89|42|4     <
300146897|300146897|300000394|300000000|609|1|12619|0|0|+000000000000020.000|1|0|14689700|7|36|4   <

但是,我试图获得更高级的输出,用星号 * 标识单元格之间的差异,如果其中一侧不存在整行,则添加破折号 -。最后每边创建一个输出文件(因为之后我要将每个输出 csv 转换为 html 以便将它们嵌入到 html 文件中),例如:

file1.out.csv

300000493|300000323|300000323|300000000|2|0|12619|0|0|+000000000000043.000*|15|0|49300|1|42|4
300315830|300315830|300000419|300000000|2|0|12619|0|0|+000000000004020.000|18|0|31583000*|89|43|4
300000493|300000323|300000323|300000000|10|0|12619|0|0|+000000000000210.000*|14|0|49300|1|43|4*
300000493|300000323|300000323|300000000|16|0|12619|0|0|+000000000000014.000|16|0|49300|89|42|4
300146897|300146897|300000394|300000000|609|1|12619|0|0|+000000000000020.000|1|0|14689700|7|36|4

file2.out.csv

300000493|300000323|300000323|300000000|2|0|12619|0|0|+000000000000053.000*|1|0|49300|1|42|4
300315830|300315830|300000419|300000000|2|0|12619|0|0|+000000000004020.000|18|0|49300*|89|43|4
300000493|300000323|300000323|300000000|10|0|12619|0|0|+000000000000219.000*|14|0|49300|1|43|5*
-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-
-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-

希望你能在这里帮助我。 谢谢!

【问题讨论】:

  • 也许可以看看其他工具,例如 meldtkdiff

标签: bash csv difference


【解决方案1】:

我认为可能的解决方案是:

paste -d '\n' file1.csv file2.csv > pasted.csv

然后读取输出文件生成我需要的

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2021-10-11
    • 2021-12-11
    相关资源
    最近更新 更多