【问题标题】:Match one table and map value to other in pandas python在pandas python中匹配一个表并将值映射到另一个表
【发布时间】:2017-06-22 12:10:41
【问题描述】:

我有两个熊猫数据框: df1:

LT     route_1 c2
PM/2     120   44
PM/52    110   49
PM/522   103   51
PM/522   103   51
PM/24    105   48
PM/536   109   67
PM/536   109   67
PM/5356  112   144 

df2:

LT       W_ID 
PM/2     120.0
PM/52    110.0
PM/522   103.0
PM/522   103.0
PM/24    105.0
PM/536   109.0
PM/536   109.0
PM/5356  112.0

我需要将 df2 中的 W_ID 映射到 df1 中的 route_1 中,要清楚,替换,但是一个表中的 LT 需要匹配另一个表中的 LT。 期望的输出:

LT     route_1   c2
PM/2     120.0   44
PM/52    110.0   49
PM/522   103.0   51
PM/522   103.0   51
PM/24    105.0   48
PM/536   109.0   67
PM/536   109.0   67
PM/5356  112.0   144 

【问题讨论】:

    标签: python pandas dictionary dataframe


    【解决方案1】:

    我认为map 应该可以工作:

    df1['route_1'] = df1['LT'].map(df2.set_index('LT')['W_ID'])
    

    很遗憾没有:

    InvalidIndexError:重新索引仅对具有唯一值的索引对象有效

    编辑:

    问题在于duplicatesLT 列中。解决方案是通过cumcount 为唯一的left join 添加辅助列merge

    df1['g'] = df1.groupby('LT').cumcount()
    df2['g'] = df2.groupby('LT').cumcount()
    df = pd.merge(df1, df2, on=['LT','g'], how='left')
    print (df)
            LT  route_1   c2  g   W_ID
    0     PM/2      120   44  0  120.0
    1    PM/52      110   49  0  110.0
    2   PM/522      103   51  0  103.0
    3   PM/522      103   51  1  103.0
    4    PM/24      105   48  0  105.0
    5   PM/536      109   67  0  109.0
    6   PM/536      109   67  1  109.0
    7  PM/5356      112  144  0  112.0
    
    df1['route_1'] = df['W_ID']
    df1.drop('g', axis=1, inplace=True)
    print (df1)
            LT  route_1   c2
    0     PM/2    120.0   44
    1    PM/52    110.0   49
    2   PM/522    103.0   51
    3   PM/522    103.0   51
    4    PM/24    105.0   48
    5   PM/536    109.0   67
    6   PM/536    109.0   67
    7  PM/5356    112.0  144
    

    类似的解决方案:

    df1['g'] = df1.groupby('LT').cumcount()
    df2['g'] = df2.groupby('LT').cumcount()
    df = pd.merge(df1, df2, on=['LT','g'], how='left')
           .drop(['g', 'route_1'], axis=1)
           .rename(columns={'W_ID':'route_1'})
           .reindex_axis(['LT', 'route_1', 'c2'], axis=1)
    print (df)
            LT  route_1   c2
    0     PM/2    120.0   44
    1    PM/52    110.0   49
    2   PM/522    103.0   51
    3   PM/522    103.0   51
    4    PM/24    105.0   48
    5   PM/536    109.0   67
    6   PM/536    109.0   67
    7  PM/5356    112.0  144
    

    【讨论】:

    • 我认为这是一个好方法,但我得到了这个错误:重新索引仅对具有唯一值的索引对象有效
    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2011-01-25
    • 1970-01-01
    • 1970-01-01
    • 2016-07-04
    • 1970-01-01
    • 2021-12-11
    相关资源
    最近更新 更多