【问题标题】:Comparison of two csv file and output with differences?两个csv文件的比较和输出有什么不同?
【发布时间】:2018-04-29 16:55:37
【问题描述】:

我正在比较两个 csv 文件,但 update.csv 文件与 new.csv 相同

import csv

with open('old.csv', 'r') as t1:
    old_csv = t1.readlines()

with open('new.csv', 'r') as t2:
    new_csv = t2.readlines()

with open('update.csv', 'w') as out_file:
        line_in_new = 0
        line_in_old = 0
        while line_in_new < len(new_csv) and line_in_old < len(old_csv):
            if old_csv[line_in_old] != new_csv[line_in_new]:
                out_file.write(new_csv[line_in_new])
            else:
        line_in_old += 1
    line_in_new += 1

我希望输出与示例相同。

示例:

输入:

旧的.csv

a,b,c
1,2,3
4,5,6
8,9,9

新的.csv

a,b,c
1,2,3
5,6,7
8,9,7

输出:

更新.csv

4,5,6,deleted
5,6,7,new added 
8,9,9,change

请帮我找出update.csv的唯一区别

【问题讨论】:

  • 你所说的差异是什么意思?请发布清晰的输入示例和所需的输出。

标签: python csv difference


【解决方案1】:

使用 pandas 的解决方案:

import pandas as pd

df1 = pd.read_csv('old.csv')
df2 = pd.read_csv('new.csv')

df1['flag'] = 'old'
df2['flag'] = 'new'

df = pd.concat([df1, df2])

dups_dropped = df.drop_duplicates(df.columns.difference(['flag']), keep=False)
dups_dropped.to_csv('update.csv', index=False)

输入

old.csv

a,b,c
1,2,3
4,5,6

new.csv

a,b,c
1,2,3
5,6,7

输出

update.csv

a,b,c,flag
4,5,6,old
5,6,7,new

【讨论】:

  • 谢谢,Ashish,如果我只想显示差异意味着旧的和新的。我怎么能得到
猜你喜欢
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 2022-01-24
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
相关资源
最近更新 更多