【问题标题】:Pandas Merge DF1 and DF2 Error on Size of Final DF3Pandas 在最终 DF3 的大小上合并 DF1 和 DF2 错误
【发布时间】:2021-04-20 23:19:43
【问题描述】:

我有两个 dfs - df1 和 df2,我正在尝试将 df2 的单个列与 df1 合并到一个公共列上。合并的结果不断给我一个意想不到的结果。这是df1:

            plant_name  year  month      power_kwh
0         CAYUGA RIDGE  2021      1  100013.479435
1         CAYUGA RIDGE  2021      2  111393.468601
2         CAYUGA RIDGE  2021      3  130056.708737
3       COLORADO GREEN  2021      1   51434.064247
4       COLORADO GREEN  2021      2   42969.370685
5       COLORADO GREEN  2021      3   60889.168683
6            ELK RIVER  2021      1   65281.449328
7            ELK RIVER  2021      2   65972.003571
8            ELK RIVER  2021      3   80007.661559
9         FARMERS CITY  2021      1   44259.997043
10        FARMERS CITY  2021      2   35335.309821
11        FARMERS CITY  2021      3   56857.576344
12         NEW HARVEST  2021      1   36756.896237
13         NEW HARVEST  2021      2   27696.980506
14         NEW HARVEST  2021      3   47029.840726
15         OTTER CREEK  2021      1   56448.063978
16         OTTER CREEK  2021      2   60875.162054
17         OTTER CREEK  2021      3   72305.531317
18  PROVIDENCE HEIGHTS  2021      1   23142.938038
19  PROVIDENCE HEIGHTS  2021      2   23472.394494
20  PROVIDENCE HEIGHTS  2021      3   29106.458065
21         TWIN BUTTES  2021      1   26532.625000
22         TWIN BUTTES  2021      2   23030.252679
23         TWIN BUTTES  2021      3   31493.110484
24      TWIN BUTTES II  2021      1   35024.166667
25      TWIN BUTTES II  2021      2   30236.458929
26      TWIN BUTTES II  2021      3   40956.497446

和df2:

            plant_name  month  power_kwh_mean
0         CAYUGA RIDGE      1   117966.937473
1         CAYUGA RIDGE      2   111414.235063
2         CAYUGA RIDGE      3   111046.530466
3       COLORADO GREEN      1    48858.795995
4       COLORADO GREEN      2    53334.673501
5       COLORADO GREEN      3    59363.149449
6            ELK RIVER      1    63520.921129
7            ELK RIVER      2    62582.570332
8            ELK RIVER      3    68408.441317
9         FARMERS CITY      1    45566.598244
10        FARMERS CITY      2    45682.893254
11        FARMERS CITY      3    49413.345551
12         NEW HARVEST      1    40170.598884
13         NEW HARVEST      2    39620.202054
14         NEW HARVEST      3    40155.011850
15         OTTER CREEK      1    66020.339095
16         OTTER CREEK      2    62372.075373
17         OTTER CREEK      3    61797.622670
18  PROVIDENCE HEIGHTS      1    27437.261725
19  PROVIDENCE HEIGHTS      2    25987.220209
20  PROVIDENCE HEIGHTS      3    25424.756976
21         TWIN BUTTES      1    25366.811806
22         TWIN BUTTES      2    28026.454688
23         TWIN BUTTES      3    31319.684315
24      TWIN BUTTES II      1    34978.325663
25      TWIN BUTTES II      2    37173.990129
26      TWIN BUTTES II      3    40054.014928

我正在使用此代码尝试合并,但它正在合并多行,或者它给我的 df3 大小为 81 x 5,我期待 27 x 5。谢谢,

df3 = pd.merge(df1, df2[[ "plant_name","power_kwh_mean"]], on="plant_name", how="left") 

            plant_name  year  month      power_kwh  power_kwh_mean
0         CAYUGA RIDGE  2021      1  100013.479435   117966.937473
1         CAYUGA RIDGE  2021      1  100013.479435   111414.235063
2         CAYUGA RIDGE  2021      1  100013.479435   111046.530466
3         CAYUGA RIDGE  2021      2  111393.468601   117966.937473
4         CAYUGA RIDGE  2021      2  111393.468601   111414.235063
5         CAYUGA RIDGE  2021      2  111393.468601   111046.530466
6         CAYUGA RIDGE  2021      3  130056.708737   117966.937473
7         CAYUGA RIDGE  2021      3  130056.708737   111414.235063
8         CAYUGA RIDGE  2021      3  130056.708737   111046.530466
9       COLORADO GREEN  2021      1   51434.064247    48858.795995
10      COLORADO GREEN  2021      1   51434.064247    53334.673501
11      COLORADO GREEN  2021      1   51434.064247    59363.149449
12      COLORADO GREEN  2021      2   42969.370685    48858.795995
13      COLORADO GREEN  2021      2   42969.370685    53334.673501
14      COLORADO GREEN  2021      2   42969.370685    59363.149449
15      COLORADO GREEN  2021      3   60889.168683    48858.795995
16      COLORADO GREEN  2021      3   60889.168683    53334.673501
17      COLORADO GREEN  2021      3   60889.168683    59363.149449
18           ELK RIVER  2021      1   65281.449328    63520.921129
19           ELK RIVER  2021      1   65281.449328    62582.570332
20           ELK RIVER  2021      1   65281.449328    68408.441317
21           ELK RIVER  2021      2   65972.003571    63520.921129
22           ELK RIVER  2021      2   65972.003571    62582.570332
23           ELK RIVER  2021      2   65972.003571    68408.441317
24           ELK RIVER  2021      3   80007.661559    63520.921129
25           ELK RIVER  2021      3   80007.661559    62582.570332
26           ELK RIVER  2021      3   80007.661559    68408.441317
27        FARMERS CITY  2021      1   44259.997043    45566.598244
28        FARMERS CITY  2021      1   44259.997043    45682.893254
29        FARMERS CITY  2021      1   44259.997043    49413.345551
30        FARMERS CITY  2021      2   35335.309821    45566.598244
31        FARMERS CITY  2021      2   35335.309821    45682.893254
32        FARMERS CITY  2021      2   35335.309821    49413.345551
33        FARMERS CITY  2021      3   56857.576344    45566.598244
34        FARMERS CITY  2021      3   56857.576344    45682.893254
35        FARMERS CITY  2021      3   56857.576344    49413.345551
36         NEW HARVEST  2021      1   36756.896237    40170.598884
37         NEW HARVEST  2021      1   36756.896237    39620.202054
38         NEW HARVEST  2021      1   36756.896237    40155.011850
39         NEW HARVEST  2021      2   27696.980506    40170.598884
40         NEW HARVEST  2021      2   27696.980506    39620.202054
41         NEW HARVEST  2021      2   27696.980506    40155.011850
42         NEW HARVEST  2021      3   47029.840726    40170.598884
43         NEW HARVEST  2021      3   47029.840726    39620.202054
44         NEW HARVEST  2021      3   47029.840726    40155.011850
45         OTTER CREEK  2021      1   56448.063978    66020.339095
46         OTTER CREEK  2021      1   56448.063978    62372.075373
47         OTTER CREEK  2021      1   56448.063978    61797.622670
48         OTTER CREEK  2021      2   60875.162054    66020.339095
49         OTTER CREEK  2021      2   60875.162054    62372.075373
50         OTTER CREEK  2021      2   60875.162054    61797.622670
51         OTTER CREEK  2021      3   72305.531317    66020.339095
52         OTTER CREEK  2021      3   72305.531317    62372.075373
53         OTTER CREEK  2021      3   72305.531317    61797.622670
54  PROVIDENCE HEIGHTS  2021      1   23142.938038    27437.261725
55  PROVIDENCE HEIGHTS  2021      1   23142.938038    25987.220209
56  PROVIDENCE HEIGHTS  2021      1   23142.938038    25424.756976
57  PROVIDENCE HEIGHTS  2021      2   23472.394494    27437.261725
58  PROVIDENCE HEIGHTS  2021      2   23472.394494    25987.220209
59  PROVIDENCE HEIGHTS  2021      2   23472.394494    25424.756976
60  PROVIDENCE HEIGHTS  2021      3   29106.458065    27437.261725
61  PROVIDENCE HEIGHTS  2021      3   29106.458065    25987.220209
62  PROVIDENCE HEIGHTS  2021      3   29106.458065    25424.756976
63         TWIN BUTTES  2021      1   26532.625000    25366.811806
64         TWIN BUTTES  2021      1   26532.625000    28026.454688
65         TWIN BUTTES  2021      1   26532.625000    31319.684315
66         TWIN BUTTES  2021      2   23030.252679    25366.811806
67         TWIN BUTTES  2021      2   23030.252679    28026.454688
68         TWIN BUTTES  2021      2   23030.252679    31319.684315
69         TWIN BUTTES  2021      3   31493.110484    25366.811806
70         TWIN BUTTES  2021      3   31493.110484    28026.454688
71         TWIN BUTTES  2021      3   31493.110484    31319.684315
72      TWIN BUTTES II  2021      1   35024.166667    34978.325663
73      TWIN BUTTES II  2021      1   35024.166667    37173.990129
74      TWIN BUTTES II  2021      1   35024.166667    40054.014928
75      TWIN BUTTES II  2021      2   30236.458929    34978.325663
76      TWIN BUTTES II  2021      2   30236.458929    37173.990129
77      TWIN BUTTES II  2021      2   30236.458929    40054.014928
78      TWIN BUTTES II  2021      3   40956.497446    34978.325663
79      TWIN BUTTES II  2021      3   40956.497446    37173.990129
80      TWIN BUTTES II  2021      3   40956.497446    40054.014928

【问题讨论】:

  • 您在df2 中有重复项,因此对于df1 中的每一个值,您在df2 中都会得到那么多重复项。为什么不只是df1['power_kwh_mean'] = df2['power_kwh_mean']
  • 感谢您的评论 - 我没有看到重复 - df2 是每个植物名称的平均值或平均值 - df1 和 df2 中的数据值不同。
  • df2 中,plant_name 列中有重复项,并且您正在合并此列。
  • 也许合并on=['plant_name','month']?

标签: pandas dataframe merge


【解决方案1】:

在这种情况下,您似乎可以在 axis=1 上使用 concat()

df3 = pd.concat([df1, df2["power_kwh_mean"]], axis=1)

或者如 jch 所说,合并 plant_namemonth

df3 = df1.merge(df2[["plant_name", "month", "power_kwh_mean"]], on=["plant_name", "month"], how="left") 

或者,如果您只需修改 df1 就可以了,Andrej 的评论可以正常工作:

df1["power_kwh_mean"] = df2["power_kwh_mean"]

这三个都应该在df3df1 中产生这个输出:

            plant_name  year  month      power_kwh  power_kwh_mean
0         CAYUGA RIDGE  2021      1  100013.479435   117966.937473
1         CAYUGA RIDGE  2021      2  111393.468601   111414.235063
2         CAYUGA RIDGE  2021      3  130056.708737   111046.530466
3       COLORADO GREEN  2021      1   51434.064247    48858.795995
4       COLORADO GREEN  2021      2   42969.370685    53334.673501
5       COLORADO GREEN  2021      3   60889.168683    59363.149449
6            ELK RIVER  2021      1   65281.449328    63520.921129
7            ELK RIVER  2021      2   65972.003571    62582.570332
8            ELK RIVER  2021      3   80007.661559    68408.441317
9         FARMERS CITY  2021      1   44259.997043    45566.598244
10        FARMERS CITY  2021      2   35335.309821    45682.893254
11        FARMERS CITY  2021      3   56857.576344    49413.345551
12         NEW HARVEST  2021      1   36756.896237    40170.598884
13         NEW HARVEST  2021      2   27696.980506    39620.202054
14         NEW HARVEST  2021      3   47029.840726    40155.011850
15         OTTER CREEK  2021      1   56448.063978    66020.339095
16         OTTER CREEK  2021      2   60875.162054    62372.075373
17         OTTER CREEK  2021      3   72305.531317    61797.622670
18  PROVIDENCE HEIGHTS  2021      1   23142.938038    27437.261725
19  PROVIDENCE HEIGHTS  2021      2   23472.394494    25987.220209
20  PROVIDENCE HEIGHTS  2021      3   29106.458065    25424.756976
21         TWIN BUTTES  2021      1   26532.625000    25366.811806
22         TWIN BUTTES  2021      2   23030.252679    28026.454688
23         TWIN BUTTES  2021      3   31493.110484    31319.684315
24      TWIN BUTTES II  2021      1   35024.166667    34978.325663
25      TWIN BUTTES II  2021      2   30236.458929    37173.990129
26      TWIN BUTTES II  2021      3   40956.497446    40054.014928

【讨论】:

    猜你喜欢
    • 2013-12-20
    • 2021-11-04
    • 2021-11-23
    • 2020-07-14
    • 1970-01-01
    • 2021-06-21
    • 2020-09-26
    • 2021-04-16
    • 1970-01-01
    相关资源
    最近更新 更多