【发布时间】:2021-04-20 23:19:43
【问题描述】:
我有两个 dfs - df1 和 df2,我正在尝试将 df2 的单个列与 df1 合并到一个公共列上。合并的结果不断给我一个意想不到的结果。这是df1:
plant_name year month power_kwh
0 CAYUGA RIDGE 2021 1 100013.479435
1 CAYUGA RIDGE 2021 2 111393.468601
2 CAYUGA RIDGE 2021 3 130056.708737
3 COLORADO GREEN 2021 1 51434.064247
4 COLORADO GREEN 2021 2 42969.370685
5 COLORADO GREEN 2021 3 60889.168683
6 ELK RIVER 2021 1 65281.449328
7 ELK RIVER 2021 2 65972.003571
8 ELK RIVER 2021 3 80007.661559
9 FARMERS CITY 2021 1 44259.997043
10 FARMERS CITY 2021 2 35335.309821
11 FARMERS CITY 2021 3 56857.576344
12 NEW HARVEST 2021 1 36756.896237
13 NEW HARVEST 2021 2 27696.980506
14 NEW HARVEST 2021 3 47029.840726
15 OTTER CREEK 2021 1 56448.063978
16 OTTER CREEK 2021 2 60875.162054
17 OTTER CREEK 2021 3 72305.531317
18 PROVIDENCE HEIGHTS 2021 1 23142.938038
19 PROVIDENCE HEIGHTS 2021 2 23472.394494
20 PROVIDENCE HEIGHTS 2021 3 29106.458065
21 TWIN BUTTES 2021 1 26532.625000
22 TWIN BUTTES 2021 2 23030.252679
23 TWIN BUTTES 2021 3 31493.110484
24 TWIN BUTTES II 2021 1 35024.166667
25 TWIN BUTTES II 2021 2 30236.458929
26 TWIN BUTTES II 2021 3 40956.497446
和df2:
plant_name month power_kwh_mean
0 CAYUGA RIDGE 1 117966.937473
1 CAYUGA RIDGE 2 111414.235063
2 CAYUGA RIDGE 3 111046.530466
3 COLORADO GREEN 1 48858.795995
4 COLORADO GREEN 2 53334.673501
5 COLORADO GREEN 3 59363.149449
6 ELK RIVER 1 63520.921129
7 ELK RIVER 2 62582.570332
8 ELK RIVER 3 68408.441317
9 FARMERS CITY 1 45566.598244
10 FARMERS CITY 2 45682.893254
11 FARMERS CITY 3 49413.345551
12 NEW HARVEST 1 40170.598884
13 NEW HARVEST 2 39620.202054
14 NEW HARVEST 3 40155.011850
15 OTTER CREEK 1 66020.339095
16 OTTER CREEK 2 62372.075373
17 OTTER CREEK 3 61797.622670
18 PROVIDENCE HEIGHTS 1 27437.261725
19 PROVIDENCE HEIGHTS 2 25987.220209
20 PROVIDENCE HEIGHTS 3 25424.756976
21 TWIN BUTTES 1 25366.811806
22 TWIN BUTTES 2 28026.454688
23 TWIN BUTTES 3 31319.684315
24 TWIN BUTTES II 1 34978.325663
25 TWIN BUTTES II 2 37173.990129
26 TWIN BUTTES II 3 40054.014928
我正在使用此代码尝试合并,但它正在合并多行,或者它给我的 df3 大小为 81 x 5,我期待 27 x 5。谢谢,
df3 = pd.merge(df1, df2[[ "plant_name","power_kwh_mean"]], on="plant_name", how="left")
plant_name year month power_kwh power_kwh_mean
0 CAYUGA RIDGE 2021 1 100013.479435 117966.937473
1 CAYUGA RIDGE 2021 1 100013.479435 111414.235063
2 CAYUGA RIDGE 2021 1 100013.479435 111046.530466
3 CAYUGA RIDGE 2021 2 111393.468601 117966.937473
4 CAYUGA RIDGE 2021 2 111393.468601 111414.235063
5 CAYUGA RIDGE 2021 2 111393.468601 111046.530466
6 CAYUGA RIDGE 2021 3 130056.708737 117966.937473
7 CAYUGA RIDGE 2021 3 130056.708737 111414.235063
8 CAYUGA RIDGE 2021 3 130056.708737 111046.530466
9 COLORADO GREEN 2021 1 51434.064247 48858.795995
10 COLORADO GREEN 2021 1 51434.064247 53334.673501
11 COLORADO GREEN 2021 1 51434.064247 59363.149449
12 COLORADO GREEN 2021 2 42969.370685 48858.795995
13 COLORADO GREEN 2021 2 42969.370685 53334.673501
14 COLORADO GREEN 2021 2 42969.370685 59363.149449
15 COLORADO GREEN 2021 3 60889.168683 48858.795995
16 COLORADO GREEN 2021 3 60889.168683 53334.673501
17 COLORADO GREEN 2021 3 60889.168683 59363.149449
18 ELK RIVER 2021 1 65281.449328 63520.921129
19 ELK RIVER 2021 1 65281.449328 62582.570332
20 ELK RIVER 2021 1 65281.449328 68408.441317
21 ELK RIVER 2021 2 65972.003571 63520.921129
22 ELK RIVER 2021 2 65972.003571 62582.570332
23 ELK RIVER 2021 2 65972.003571 68408.441317
24 ELK RIVER 2021 3 80007.661559 63520.921129
25 ELK RIVER 2021 3 80007.661559 62582.570332
26 ELK RIVER 2021 3 80007.661559 68408.441317
27 FARMERS CITY 2021 1 44259.997043 45566.598244
28 FARMERS CITY 2021 1 44259.997043 45682.893254
29 FARMERS CITY 2021 1 44259.997043 49413.345551
30 FARMERS CITY 2021 2 35335.309821 45566.598244
31 FARMERS CITY 2021 2 35335.309821 45682.893254
32 FARMERS CITY 2021 2 35335.309821 49413.345551
33 FARMERS CITY 2021 3 56857.576344 45566.598244
34 FARMERS CITY 2021 3 56857.576344 45682.893254
35 FARMERS CITY 2021 3 56857.576344 49413.345551
36 NEW HARVEST 2021 1 36756.896237 40170.598884
37 NEW HARVEST 2021 1 36756.896237 39620.202054
38 NEW HARVEST 2021 1 36756.896237 40155.011850
39 NEW HARVEST 2021 2 27696.980506 40170.598884
40 NEW HARVEST 2021 2 27696.980506 39620.202054
41 NEW HARVEST 2021 2 27696.980506 40155.011850
42 NEW HARVEST 2021 3 47029.840726 40170.598884
43 NEW HARVEST 2021 3 47029.840726 39620.202054
44 NEW HARVEST 2021 3 47029.840726 40155.011850
45 OTTER CREEK 2021 1 56448.063978 66020.339095
46 OTTER CREEK 2021 1 56448.063978 62372.075373
47 OTTER CREEK 2021 1 56448.063978 61797.622670
48 OTTER CREEK 2021 2 60875.162054 66020.339095
49 OTTER CREEK 2021 2 60875.162054 62372.075373
50 OTTER CREEK 2021 2 60875.162054 61797.622670
51 OTTER CREEK 2021 3 72305.531317 66020.339095
52 OTTER CREEK 2021 3 72305.531317 62372.075373
53 OTTER CREEK 2021 3 72305.531317 61797.622670
54 PROVIDENCE HEIGHTS 2021 1 23142.938038 27437.261725
55 PROVIDENCE HEIGHTS 2021 1 23142.938038 25987.220209
56 PROVIDENCE HEIGHTS 2021 1 23142.938038 25424.756976
57 PROVIDENCE HEIGHTS 2021 2 23472.394494 27437.261725
58 PROVIDENCE HEIGHTS 2021 2 23472.394494 25987.220209
59 PROVIDENCE HEIGHTS 2021 2 23472.394494 25424.756976
60 PROVIDENCE HEIGHTS 2021 3 29106.458065 27437.261725
61 PROVIDENCE HEIGHTS 2021 3 29106.458065 25987.220209
62 PROVIDENCE HEIGHTS 2021 3 29106.458065 25424.756976
63 TWIN BUTTES 2021 1 26532.625000 25366.811806
64 TWIN BUTTES 2021 1 26532.625000 28026.454688
65 TWIN BUTTES 2021 1 26532.625000 31319.684315
66 TWIN BUTTES 2021 2 23030.252679 25366.811806
67 TWIN BUTTES 2021 2 23030.252679 28026.454688
68 TWIN BUTTES 2021 2 23030.252679 31319.684315
69 TWIN BUTTES 2021 3 31493.110484 25366.811806
70 TWIN BUTTES 2021 3 31493.110484 28026.454688
71 TWIN BUTTES 2021 3 31493.110484 31319.684315
72 TWIN BUTTES II 2021 1 35024.166667 34978.325663
73 TWIN BUTTES II 2021 1 35024.166667 37173.990129
74 TWIN BUTTES II 2021 1 35024.166667 40054.014928
75 TWIN BUTTES II 2021 2 30236.458929 34978.325663
76 TWIN BUTTES II 2021 2 30236.458929 37173.990129
77 TWIN BUTTES II 2021 2 30236.458929 40054.014928
78 TWIN BUTTES II 2021 3 40956.497446 34978.325663
79 TWIN BUTTES II 2021 3 40956.497446 37173.990129
80 TWIN BUTTES II 2021 3 40956.497446 40054.014928
【问题讨论】:
-
您在
df2中有重复项,因此对于df1中的每一个值,您在df2中都会得到那么多重复项。为什么不只是df1['power_kwh_mean'] = df2['power_kwh_mean']? -
感谢您的评论 - 我没有看到重复 - df2 是每个植物名称的平均值或平均值 - df1 和 df2 中的数据值不同。
-
在
df2中,plant_name列中有重复项,并且您正在合并此列。 -
也许合并
on=['plant_name','month']?