获取两个 Pandas DataFrame 的每个组合？答案

【问题标题】：Getting every combination of two Pandas DataFrames?获取两个 Pandas DataFrame 的每个组合？
【发布时间】：2017-01-12 19:51:23
【问题描述】：

假设我有两个数据框：

import pandas as pd
import numpy as np

df1 = pd.DataFrame({'person':[1,1,2,2,3], 'sub_id':[20,21,21,21,21], 'otherval':[np.nan, np.nan, np.nan, np.nan, np.nan], 'other_stuff':[1,1,1,1,1]}, columns=['person','sub_id','otherval','other_stuff'])

df2 = pd.DataFrame({'sub_id':[20,21,22,23,24,25], 'otherval':[8,9,10,11,12,13]})

我希望df1 中的person 的每个级别都具有sub_id 的所有级别（包括任何重复项）及其各自的otherval 来自df2。换句话说，我的合并结果应该是这样的：

person    sub_id    otherval    other_stuff
1         20        8           1
1         21        9           NaN
1         22        10          NaN
1         23        11          Nan
1         24        12          NaN
1         25        13          NaN
2         20        8           NaN
2         21        9           1
2         21        9           1
2         22        10          NaN
2         23        11          NaN
2         24        12          NaN
2         25        13          NaN
3         20        8           NaN
3         21        9           1
3         22        10          NaN
3         23        11          NaN
3         24        12          NaN
3         25        13          NaN

注意person==2 有两行，而sub_id==21。

【问题讨论】：

请修正df1的定义，所有列的长度必须相同。
@Abdou 刚刚修好了，谢谢。
试试df1.groupby('person').apply(lambda x: pd.merge(x,df2, on='sub_id', how='right')).reset_index(level = (0,1), drop = True).ffill()。
@Abdou 我相信这行得通！除了我不想向前填充我的所有列；只是person 列。
df1.groupby('person').apply(lambda x: pd.merge(x,df2, on='sub_id', how='right')).reset_index(level = (0,1), drop = True) 得到你想要的输出，但你必须用.ffill() 方法填充person。

标签： python pandas join dataframe merge

【解决方案1】：

您可以通过以下方式获得所需的输出：

df3 = df1.groupby('person').apply(lambda x: pd.merge(x,df2, on='sub_id', how='right')).reset_index(level = (0,1), drop = True)
df3.person = df3.person.ffill().astype(int)
print df3

这应该会产生：

#     person  sub_id  otherval_x  other_stuff  otherval_y
# 0        1      20         NaN          1.0           8
# 1        1      21         NaN          1.0           9
# 2        1      22         NaN          NaN          10
# 3        1      23         NaN          NaN          11
# 4        1      24         NaN          NaN          12
# 5        1      25         NaN          NaN          13
# 6        2      21         NaN          1.0           9
# 7        2      21         NaN          1.0           9
# 8        2      20         NaN          NaN           8
# 9        2      22         NaN          NaN          10
# 10       2      23         NaN          NaN          11
# 11       2      24         NaN          NaN          12
# 12       2      25         NaN          NaN          13
# 13       3      21         NaN          1.0           9
# 14       3      20         NaN          NaN           8
# 15       3      22         NaN          NaN          10
# 16       3      23         NaN          NaN          11
# 17       3      24         NaN          NaN          12
# 18       3      25         NaN          NaN          13

希望对你有帮助。

【讨论】：