如何比较两个 Pandas DataFrame 并显示 DataFrame 2 中的差异答案

【问题标题】：How To Compare Two Pandas DataFrames and Show Differences In DataFrame 2如何比较两个 Pandas DataFrame 并显示 DataFrame 2 中的差异
【发布时间】：2018-08-30 08:05:50
【问题描述】：

我目前有两个 pandas 数据框：

sales = [{'account': 'Jones LLC', 'Jan': 150, 'Feb': 200, 'Mar': 140},
         {'account': 'Alpha Co',  'Jan': 200, 'Feb': 210, 'Mar': 215}]
sales2 = [{'account': 'Jones LLC', 'Jan': 150, 'Feb': 200, 'Mar': 140},
         {'account': 'Alpha Co',  'Jan': 200, 'Feb': 210, 'Mar': 215},
         {'account': 'Blue Inc',  'Jan': 50,  'Feb': 90,  'Mar': 95 }]
test_1 = pd.DataFrame(sales)
test_2 = pd.DataFrame(sales2)

我想要实现的是仅显示 'test_2' 而不是 'test_1' 中的差异。

我目前拥有的代码将两个数据帧连接起来，并向我展示了两个数据帧之间的总差异，但是我只想查看“test_2”与“test_1”之间的差异，而不是相反：

def compare_dataframes(df1, df2):

    print 'Comparing dataframes...'
    df = pd.concat([df1, df2])
    df = df.reset_index(drop=True)
    df_gpby = df.groupby(list(df.columns))
    idx = [x[0] for x in df_gpby.groups.values() if len(x) == 1]
    compared_data = df.reindex(idx)
    if len(compared_data) > 1:
        print 'No new sales on site!'
    else:
        print 'New sales on site!'
        print(compared_data)

如何调整我当前的功能使其像这样工作？

【问题讨论】：

标签： python python-2.7 pandas dataframe

【解决方案1】：

将merge 与外连接和indicator 参数一起使用：

df = test_1.merge(test_2, how='outer', indicator=True)
print (df)
   Feb  Jan  Mar    account      _merge
0  200  150  140  Jones LLC        both
1  210  200  215   Alpha Co        both
2   90   50   95   Blue Inc  right_only

然后通过boolean indexing 仅过滤right_only 行：

only2 = df[df['_merge'] == 'right_only']
print (only2)
   Feb  Jan  Mar   account      _merge
2   90   50   95  Blue Inc  right_only

感谢@Jon Clements 提供带有回调的单行解决方案：

only2 = test_1.merge(test_2, how='outer', indicator=True)[lambda r: r._merge == 'right_only']
print (only2)
   Feb  Jan  Mar   account      _merge
2   90   50   95  Blue Inc  right_only

或使用query:

only2 = test_1.merge(test_2, how='outer', indicator=True).query("_merge == 'right_only'")

【讨论】：

即将发帖：test_1.merge(test_2, how='outer', indicator=True)[lambda r: r._merge == 'right_only'].drop('_merge', axis=1) :p

【解决方案2】：

import pandas as pd
import numpy as np
sales = [{'account': 'Jones LLC', 'Jan': 150, 'Feb': 200, 'Mar': 140},
         {'account': 'Alpha Co',  'Jan': 200, 'Feb': 210, 'Mar': 215}]
sales2 = [{'account': 'Jones LLC', 'Jan': 150, 'Feb': 200, 'Mar': 140},
         {'account': 'Alpha Co',  'Jan': 200, 'Feb': 210, 'Mar': 215},
         {'account': 'Blue Inc',  'Jan': 50,  'Feb': 90,  'Mar': 95 }]
test_1 = pd.DataFrame(sales)
test_2 = pd.DataFrame(sales2)
test_3 = test_1.append(test_2).drop_duplicates(keep=False)
print (test_3)

它打印不同的行

【讨论】：