在pandas python中匹配一个表并将值映射到另一个表答案

【问题标题】：Match one table and map value to other in pandas python在pandas python中匹配一个表并将值映射到另一个表
【发布时间】：2017-06-22 12:10:41
【问题描述】：

我有两个熊猫数据框： df1：

LT     route_1 c2
PM/2     120   44
PM/52    110   49
PM/522   103   51
PM/522   103   51
PM/24    105   48
PM/536   109   67
PM/536   109   67
PM/5356  112   144

df2:

LT       W_ID 
PM/2     120.0
PM/52    110.0
PM/522   103.0
PM/522   103.0
PM/24    105.0
PM/536   109.0
PM/536   109.0
PM/5356  112.0

我需要将 df2 中的 W_ID 映射到 df1 中的 route_1 中，要清楚，替换，但是一个表中的 LT 需要匹配另一个表中的 LT。期望的输出：

LT     route_1   c2
PM/2     120.0   44
PM/52    110.0   49
PM/522   103.0   51
PM/522   103.0   51
PM/24    105.0   48
PM/536   109.0   67
PM/536   109.0   67
PM/5356  112.0   144

【问题讨论】：

标签： python pandas dictionary dataframe

【解决方案1】：

我认为map 应该可以工作：

df1['route_1'] = df1['LT'].map(df2.set_index('LT')['W_ID'])

很遗憾没有：

InvalidIndexError：重新索引仅对具有唯一值的索引对象有效

编辑：

问题在于duplicates 在LT 列中。解决方案是通过cumcount 为唯一的left join 添加辅助列merge：

df1['g'] = df1.groupby('LT').cumcount()
df2['g'] = df2.groupby('LT').cumcount()
df = pd.merge(df1, df2, on=['LT','g'], how='left')
print (df)
        LT  route_1   c2  g   W_ID
0     PM/2      120   44  0  120.0
1    PM/52      110   49  0  110.0
2   PM/522      103   51  0  103.0
3   PM/522      103   51  1  103.0
4    PM/24      105   48  0  105.0
5   PM/536      109   67  0  109.0
6   PM/536      109   67  1  109.0
7  PM/5356      112  144  0  112.0

df1['route_1'] = df['W_ID']
df1.drop('g', axis=1, inplace=True)
print (df1)
        LT  route_1   c2
0     PM/2    120.0   44
1    PM/52    110.0   49
2   PM/522    103.0   51
3   PM/522    103.0   51
4    PM/24    105.0   48
5   PM/536    109.0   67
6   PM/536    109.0   67
7  PM/5356    112.0  144

类似的解决方案：

df1['g'] = df1.groupby('LT').cumcount()
df2['g'] = df2.groupby('LT').cumcount()
df = pd.merge(df1, df2, on=['LT','g'], how='left')
       .drop(['g', 'route_1'], axis=1)
       .rename(columns={'W_ID':'route_1'})
       .reindex_axis(['LT', 'route_1', 'c2'], axis=1)
print (df)
        LT  route_1   c2
0     PM/2    120.0   44
1    PM/52    110.0   49
2   PM/522    103.0   51
3   PM/522    103.0   51
4    PM/24    105.0   48
5   PM/536    109.0   67
6   PM/536    109.0   67
7  PM/5356    112.0  144

【讨论】：

我认为这是一个好方法，但我得到了这个错误：重新索引仅对具有唯一值的索引对象有效