【问题标题】:Merge two data frame based on specific condition in pandas根据熊猫中的特定条件合并两个数据框
【发布时间】:2020-04-24 23:08:47
【问题描述】:

我有两个数据框,如下所示

df1 - 检查员 ID 和指定地点

df1:

Inspector_ID    Assigned_Place
1               ['Bangalore', 'Chennai']
2               ['Bangalore', 'Delhi', 'Chennai']
3               ['Bangalore', 'Delhi']
4               ['Chennai', 'Mumbai']

df2 - 检查员在每个地方提出的票数 df2:

Inpector_ID     Place        Tickets     
1               Bangalore    20           
1               Mumbai       4            
2               Bangalore    40           
2               Delhi        4            
3               Delhi        20           
3               Mumbai       10           
4               Chennai      20           
4               Mumbai       8      

从上面的数据帧我想生成下面的数据帧。

Inpector_ID     Place        Tickets      Assigned
1               Bangalore    20           Yes
1               Mumbai       4            No
1               Chennai      0            Yes
2               Bangalore    40           Yes
2               Delhi        4            Yes
2               Chennai      0            Yes
3               Delhi        20           Yes
3               Mumbai       10           No
3               Bangalore    0            Yes
4               Chennai      20           Yes
4               Mumbai       8            Yes

为问题添加更多内容

df1 是 2019 年全年的时间表,即 2019 年整个月的时间表。

df2:

Inpector_ID     Place        Tickets     YearMonth
    1           Bangalore    20          201901 
    1           Mumbai       4           201901     
    2           Bangalore    40          201901       
    2           Delhi        4           201901       
    3           Delhi        20          201901      
    3           Mumbai       10          201901         
    4           Chennai      20          201901       
    4           Mumbai       8           201901
    1           Bangalore    20          201902 
    1           Mumbai       4           201902     
    2           Bangalore    40          201902       
    2           Delhi        4           201902
    2           Chennai      8           201902       
    3           Delhi        20          201902      
    3           Mumbai       10          201902         
    4           Chennai      20          201902       
    4           Delhi        8           201902

我想在数据框下面

预期输出:

     Inpector_ID     Place        Tickets    YearMonth   Assigned
        1           Bangalore    20          201901      Yes
        1           Chennai      0           201901       Yes
        1           Mumbai       4           201901      No
        2           Bangalore    40          201901      Yes  
        2           Delhi        4           201901      Yes
        2           Chennai      0           201901      Yes      
        3           Delhi        20          201901      Yes
        3           Mumbai       10          201901      No
        3           Bangalore     0          201901      Yes     
        4           Chennai      20          201901      Yes 
        4           Mumbai       8           201901      Yes
        1           Bangalore    20          201902      Yes
        1           Mumbai       4           201902      No
        1           Chennai      0           201901      Yes     
        2           Bangalore    40          201902      Yes     
        2           Delhi        4           201902      Yes
        2           Chennai      8           201902      Yes    
        3           Delhi        20          201902      Yes     
        3           Mumbai       10          201902      No
        3           Bangalore     0          201901      Yes       
        4           Chennai      20          201902      Yes
        4           Delhi        8           201902      No
        4           Mumbai       0           201902      Yes

【问题讨论】:

    标签: pandas merge pandas-groupby


    【解决方案1】:

    首先转换由DataFrame.explode 填充的列表,然后通过外部连接和指标参数转换merge,最后设置新列名:

    df1 = df1.explode('Assigned_Place').rename(columns={'Assigned_Place':'Place'})
    
    df = (df2.merge(df1, how='outer', indicator='Assigned')
             .sort_values(['Inspector_ID','Place'])
             .fillna({'Tickets':0})
             .assign(Assigned = lambda x: np.where(x['Assigned'].eq('left_only'), 'No', 'Yes'))
             )
    print (df)
        Inspector_ID      Place  Tickets Assigned
    0              1  Bangalore     20.0      Yes
    8              1    Chennai      0.0      Yes
    1              1     Mumbai      4.0       No
    2              2  Bangalore     40.0      Yes
    9              2    Chennai      0.0      Yes
    3              2      Delhi      4.0      Yes
    10             3  Bangalore      0.0      Yes
    4              3      Delhi     20.0      Yes
    5              3     Mumbai     10.0       No
    6              4    Chennai     20.0      Yes
    7              4     Mumbai      8.0      Yes
    

    编辑:解决方案类似,只是通过所有唯一的YearMonth 值添加交叉连接:

    df1 = df1.explode('Assigned_Place').rename(columns={'Assigned_Place':'Place'})
    df11 = pd.DataFrame({'YearMonth':df2['YearMonth'].unique(), 'a':1})
    df1 = df1.assign(a=1).merge(df11, on='a').drop('a', 1)
    df = (df2.merge(df1, how='outer', indicator='Assigned')
             .sort_values(['Inspector_ID','Place'])
             .fillna({'Tickets':0})
             .assign(Assigned = lambda x: np.where(x['Assigned'].eq('left_only'), 'No', 'Yes'))
             )
    print (df)
        Inspector_ID      Place  Tickets  YearMonth Assigned
    0              1  Bangalore     20.0     201901      Yes
    8              1  Bangalore     20.0     201902      Yes
    17             1    Chennai      0.0     201901      Yes
    18             1    Chennai      0.0     201902      Yes
    1              1     Mumbai      4.0     201901       No
    9              1     Mumbai      4.0     201902       No
    2              2  Bangalore     40.0     201901      Yes
    10             2  Bangalore     40.0     201902      Yes
    12             2    Chennai      8.0     201902      Yes
    19             2    Chennai      0.0     201901      Yes
    3              2      Delhi      4.0     201901      Yes
    11             2      Delhi      4.0     201902      Yes
    20             3  Bangalore      0.0     201901      Yes
    21             3  Bangalore      0.0     201902      Yes
    4              3      Delhi     20.0     201901      Yes
    13             3      Delhi     20.0     201902      Yes
    5              3     Mumbai     10.0     201901       No
    14             3     Mumbai     10.0     201902       No
    6              4    Chennai     20.0     201901      Yes
    15             4    Chennai     20.0     201902      Yes
    16             4      Delhi      8.0     201902       No
    7              4     Mumbai      8.0     201901      Yes
    22             4     Mumbai      0.0     201902      Yes    
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 2018-09-16
      • 2017-12-27
      • 2017-06-11
      • 2016-01-01
      • 2017-09-02
      相关资源
      最近更新 更多