结合两个数据框答案

【问题标题】：Combining two dataframes结合两个数据框
【发布时间】：2021-07-18 04:51:23
【问题描述】：

我尝试合并两个数据框，但似乎无法正常工作。每次合并时，我期望值的行都是 0。数据框 df1 已经作为其中的一些数据，有些留空。数据框df2 将填充df1 中的那些空白行，其中列名与df1 中“TempBin”中的每个值和“Month”中的每个值匹配。

编辑：两个数据帧都在 for 循环中。 df1 充当我的“存储”，df2 每次位置迭代都会更改。因此，如果df2 包含 LocationZP 的结果，我还希望将该数据插入到匹配的df1 行中。如果我在for 循环中使用df1 = df1.append(df2)，则df2 中的所有行都会在每次迭代的df1 的最后插入。

df1:

Month  TempBin  LocationAA   LocationXA   LocationZP
 1      0       7            1            2
 1      1       98           0            89
 1      2       12           23           38
 1      3       3            14           17
 1      4       7            9            14
 1      5       1            8            99
 13     0       0            0            0
 13     1       0            0            0
 13     2       0            0            0
 13     3       0            0            0
 13     4       0            0            0
 13     5       0            0            0

df2:

Month  TempBin  LocationAA
 13     0       11
 13     1       22
 13     2       33
 13     3       44
 13     4       55
 13     5       66

df1 中的所需输出：

Month  TempBin  LocationAA   LocationXA   LocationZP
 1      0       7            1            2
 1      1       98           0            89
 1      2       12           23           38
 1      3       3            14           17
 1      4       7            9            14
 1      5       1            8            99
 13     0       11           0            0
 13     1       22           0            0
 13     2       33           0            0
 13     3       44           0            0
 13     4       55           0            0
 13     5       66           0            0

import pandas as pd

df1 = pd.DataFrame({'Month': [1]*6 + [13]*6,
                   'TempBin': [0,1,2,3,4,5]*2,
                   'LocationAA': [7,98,12,3,7,1,0,0,0,0,0,0],
                   'LocationXA': [1,0,23,14,9,8,0,0,0,0,0,0],
                   'LocationZP': [2,89,38,17,14,99,0,0,0,0,0,0]}
                   )

df2 = pd.DataFrame({'Month': [13]*6,
                   'TempBin': [0,1,2,3,4,5],
                   'LocationAA': [11,22,33,44,55,66]}
                   )

df1 = pd.merge(df1, df2, on=["Month","TempBin","LocationAA"], how="left")

结果：

Month  TempBin  LocationAA  LocationXA  LocationZP
1      0        7.0         1.0         2.0
1      1        98.0        0.0         89.0
1      2        12.0        23.0        38.0
1      3        3.0         14.0        17.0
1      4        7.0         9.0         14.0
1      5        1.0         8.0         99.0
13     0        NaN         NaN         NaN
13     1        NaN         NaN         NaN
13     2        NaN         NaN         NaN
13     3        NaN         NaN         NaN
13     4        NaN         NaN         NaN
13     5        NaN         NaN         NaN

【问题讨论】：

标签： python pandas dataframe

【解决方案1】：

下面是一些对我有用的代码：

# Merge two df into one dataframe on the columns "TempBin" and "Month" filling nan values with 0.
import pandas as pd

df1 = pd.DataFrame({'Month': [1]*6 + [13]*6,
                   'TempBin': [0,1,2,3,4,5]*2,
                   'LocationAA': [7,98,12,3,7,1,0,0,0,0,0,0],
                   'LocationXA': [1,0,23,14,9,8,0,0,0,0,0,0],
                   'LocationZP': [2,89,38,17,14,99,0,0,0,0,0,0]}
                   )

df2 = pd.DataFrame({'Month': [13]*6,
                   'TempBin': [0,1,2,3,4,5],
                   'LocationAA': [11,22,33,44,55,66]})

df_merge = pd.merge(df1, df2, how='left', 
            left_on=['TempBin', 'Month'], 
            right_on=['TempBin', 'Month'])

df_merge.fillna(0, inplace=True)

# add column LocationAA and fill it with the not null value from column LocationAA_x and LocationAA_y
df_merge['LocationAA'] = df_merge.apply(lambda x: x['LocationAA_x'] if pd.isnull(x['LocationAA_y']) else x['LocationAA_y'], axis=1)

# remove column LocationAA_x and LocationAA_y
df_merge.drop(['LocationAA_x', 'LocationAA_y'], axis=1, inplace=True)

print(df_merge)

输出：

    Month  TempBin  LocationXA  LocationZP  LocationAA
0       1        0         1.0         2.0         0.0
1       1        1         0.0        89.0         0.0
2       1        2        23.0        38.0         0.0
3       1        3        14.0        17.0         0.0
4       1        4         9.0        14.0         0.0
5       1        5         8.0        99.0         0.0
6      13        0         0.0         0.0        11.0
7      13        1         0.0         0.0        22.0
8      13        2         0.0         0.0        33.0
9      13        3         0.0         0.0        44.0
10     13        4         0.0         0.0        55.0
11     13        5         0.0         0.0        66.0

如果您在 cmets 中有什么不明白的地方，请告诉我 :)

PS：对不起，额外的 cmets。但我把它们留在那里是为了更多的解释。

【讨论】：

【解决方案2】：

您需要使用 append 来获得所需的输出：

df1 = df1.append(df2)

如果您想将 Null 替换为零，请添加：

df1 = df1.fillna(0)

【讨论】：

这仅适用于追加一次，但如果我在 for 循环中追加多次，它将继续添加额外的行。我将更新我的问题以更具体。尽我所能，尽可能简单地传达。
好的。因为发布的 DF2 没有任何唯一列可以贡献给 DF1

【解决方案3】：

这是使用combine_first()的另一种方式

i = ['Month','TempBin']
df2.set_index(i).combine_first(df1.set_index(i)).reset_index()

【讨论】：