【问题标题】:Python Pandas: Merging data frames on multiple conditionsPython Pandas:在多个条件下合并数据帧
【发布时间】:2017-12-30 04:59:02
【问题描述】:

我希望在多个条件下合并通过 sql 获取的数据帧。

  • df1:第一个 df 包含客户 ID、集群 ID 和客户区域 ID。
  • 第二个df包含投诉ID,注册号。

df1和df2如下图:

df1

Customer ID     Cluster ID  Customer Zone ID
CUS1001.A       CUS1001.X   CUS1000
CUS1001.B       CUS1001.X   CUS1000
CUS1001.C       CUS1001.X   CUS1000
CUS1001.D       CUS1001.X   CUS1000
CUS1001.E       CUS1001.X   CUS1000
CUS2001.A       CUS2001.X   CUS2000

df2:

Complain ID RegistrationNumber   Status
CUS3501.A       99231            open
CUS1001.B       21340            open
CUS1001.X       32100            open

我希望将这两个数据框合并为以下条件:

if(Complain ID == Customer ID):
    Merge on Customer ID
Elif(Complain ID == Cluster ID):
    Merge on Customer ID
Elif (Complain ID == Customer Zone ID):
    Merge on Customer ID
Else:
    Merge empty row.

最终结果应如下所示:

Customer ID Cluster ID  Customer Zone ID   Complain ID  Regi ID  Status
CUS1001.A   CUS1001.X       CUS1000         CUS1001.X    32100    open
CUS1001.B   CUS1001.X       CUS1000         CUS1001.B    21340    open
CUS1001.C   CUS1001.X       CUS1000         CUS1001.X    32100    open
  .             .               .               .           .       .
  .             .               .               .           .       .
CUS2001.A   CUS2001.X       CUS2000             0           0       0

请帮忙!

【问题讨论】:

    标签: python pandas numpy


    【解决方案1】:

    试试这个...使用pandasmeltmergeconcat

    df=pd.melt(df1)
    df=df.merge(df2,left_on='value',right_on='Complain ID',how='left')
    df['number']=df.groupby('variable').cumcount()
    df=df.groupby('number').bfill()
    Target=pd.concat([df1,df.iloc[:5,2:6]],axis=1).fillna(0).drop('number',axis=1)
    
    Target
    Out[39]: 
      Customer ID Cluster ID Customer Zone ID Complain ID  RegistrationNumber  \
    0   CUS1001.A  CUS1001.X          CUS1000   CUS1001.X             32100.0   
    1   CUS1001.B  CUS1001.X          CUS1000   CUS1001.B             21340.0   
    2   CUS1001.C  CUS1001.X          CUS1000   CUS1001.X             32100.0   
    3   CUS1001.D  CUS1001.X          CUS1000   CUS1001.X             32100.0   
    4   CUS1001.E  CUS1001.X          CUS1000   CUS1001.X             32100.0   
    5   CUS2001.A  CUS2001.X          CUS2000           0                 0.0   
      Status    
    0   open         
    1   open         
    2   open         
    3   open         
    4   open        
    5      0         
    

    更新 通过使用 numpy 的 intersect1d,我个人比以前更喜欢这种方法。

    df1.MatchId=[np.intersect1d(x,df2.ComplainID.values) for x in df1[['CustomerID','ClusterID']].values]
    df1.MatchId=df1.MatchId.apply(pd.Series)
    df1
    Out[307]:
      CustomerID  ClusterID CustomerZoneID    MatchId
    0  CUS1001.A  CUS1001.X        CUS1000  CUS1001.X
    1  CUS1001.B  CUS1001.X        CUS1000  CUS1001.B
    2  CUS1001.C  CUS1001.X        CUS1000  CUS1001.X
    3  CUS1001.D  CUS1001.X        CUS1000  CUS1001.X
    4  CUS1001.E  CUS1001.X        CUS1000  CUS1001.X
    5  CUS2001.A  CUS2001.X        CUS2000        NaN
    
    df1.merge(df2,left_on='MatchId',right_on='ComplainID',how='left')
    Out[311]: 
      CustomerID  ClusterID CustomerZoneID    MatchId ComplainID  \
    0  CUS1001.A  CUS1001.X        CUS1000  CUS1001.X  CUS1001.X   
    1  CUS1001.B  CUS1001.X        CUS1000  CUS1001.B  CUS1001.B   
    2  CUS1001.C  CUS1001.X        CUS1000  CUS1001.X  CUS1001.X   
    3  CUS1001.D  CUS1001.X        CUS1000  CUS1001.X  CUS1001.X   
    4  CUS1001.E  CUS1001.X        CUS1000  CUS1001.X  CUS1001.X   
    5  CUS2001.A  CUS2001.X        CUS2000        NaN        NaN   
       RegistrationNumber Status  
    0             32100.0   open  
    1             21340.0   open  
    2             32100.0   open  
    3             32100.0   open  
    4             32100.0   open  
    5                 NaN    NaN  
    

    【讨论】:

      猜你喜欢
      • 2016-01-19
      • 1970-01-01
      • 2018-05-30
      • 2016-10-17
      • 2021-10-18
      • 2021-04-23
      • 1970-01-01
      • 2019-08-10
      • 1970-01-01
      相关资源
      最近更新 更多