【问题标题】:pandas merge ignore duplicate merged rows熊猫合并忽略重复的合并行
【发布时间】:2022-01-11 16:06:30
【问题描述】:

我正在尝试合并以下两个数据框,但没有得到预期的结果。

import pandas as pd
previous_dict = [{"category1":"Home", "category2":"Power","usage":"15","amount":"65"},
                 {"category1":"Home", "category2":"Power","usage":"2","amount":"15"},
                 {"category1":"Home", "category2":"Vehicle","usage":"6","amount":"5"}
                ]
current_dict = [{"category1":"Home", "category2":"Power","usage":"16","amount":"79"},
                 {"category1":"Home", "category2":"Power","usage":"0.5","amount":"2"},
                 {"category1":"Home", "category2":"Vehicle","usage":"3","amount":"4"}
                ]

df_previous = pd.DataFrame.from_dict(previous_dict)
print(df_previous)

df_current = pd.DataFrame.from_dict(current_dict)
print(df_current)

df_merge = pd.merge(df_previous, df_current, on=['category1','category2'], how='outer',indicator=True, suffixes=('', '_y'))
print(df_merge)

上一年的数据框

  category1 category2 usage amount
0      Home     Power    15     65
1      Home     Power     2     15
2      Home   Vehicle     6      5

当前年份数据框

  category1 category2 usage amount
0      Home     Power    16     79
1      Home     Power   0.5      2
2      Home   Vehicle     3      4

当前结果:

  category1 category2 usage amount usage_y amount_y _merge
0      Home     Power    15     65      16       79   both
1      Home     Power    15     65     0.5        2   both
2      Home     Power     2     15      16       79   both
3      Home     Power     2     15     0.5        2   both
4      Home   Vehicle     6      5       3        4   both

但我的预期结果是,

  category1 category2 usage amount usage_y amount_y _merge
0      Home     Power    15     65      16       79   both
3      Home     Power     2     15     0.5        2   both
4      Home   Vehicle     6      5       3        4   both

当类别 1 和类别 2 在两个表中多次具有相同的值时,我只想将其与正确的顺序匹配。我怎样才能得到我期望的值?

【问题讨论】:

  • 您似乎想将一个数据框的 usageamount 列插入到另一个数据框。你能更好地解释合并逻辑吗?

标签: python pandas merge


【解决方案1】:

我认为这是由于您加入的列中的重复而发生的。解决此问题的一种方法是也使用索引,如下所示:

df_merge = pd.merge(df_previous.reset_index(), df_current.reset_index(), on=['category1','category2', 'index'], how='outer',indicator=True, suffixes=('', '_y'))

   index category1 category2 usage amount usage_y amount_y _merge
0      0      Home     Power    15     65      16       79   both
1      1      Home     Power     2     15     0.5        2   both
2      2      Home   Vehicle     6      5       3        4   both

【讨论】:

    【解决方案2】:

    看起来您要做的是将一个数据框的列添加到另一个数据框,而不是通常认为的“合并”。考虑到这一点,请考虑以下事项。

    df_new = df_previous.copy()
    df_new = df_new.rename(columns = {"usage":"usage_prev","amount":"amount_prev"})
    df_new[["usage_current","amount_current"]] = df_current[["usage","amount"]]
    
    print(df_new)
    

    结果输出:

      category1 category2 usage_prev amount_prev usage_current amount_current
    0      Home     Power         15          65            16             79
    1      Home     Power          2          15           0.5              2
    2      Home   Vehicle          6           5             3              4
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 2021-10-25
      • 1970-01-01
      • 2018-12-20
      • 2019-12-15
      • 2017-11-14
      • 2020-09-26
      • 1970-01-01
      相关资源
      最近更新 更多