【问题标题】:Exclude rows in a dataframe based on matching values in rows from another dataframe根据来自另一个数据帧的行中的匹配值排除数据帧中的行
【发布时间】:2019-08-14 15:06:14
【问题描述】:

我有两个数据框(A 和 B)。我想删除 B 中的所有行,其中 Month、Year、Type、Name 列的值完全匹配。

数据框 A

   Name    Type   Month   Year  country Amount   Expiration  Paid
0 EXTRON   GOLD   March   2019    CA    20000   2019-09-07   yes
0 LEAF    SILVER  March   2019    PL    4893    2019-02-02   yes       
0 JMC     GOLD    March   2019    IN    7000    2020-01-16   no       

数据框 B

  Name     Type   Month   Year  country Amount   Expiration  Paid
0 JONS    GOLD    March   2018    PL    500     2019-10-17   yes
0 ABBY    BRONZE  March   2019    AU    60000   2019-02-02   yes       
0 BUYT     GOLD   March   2018    BR     50     2018-03-22   no       
0 EXTRON  GOLD    March   2019    CA    90000   2019-09-07   yes
0 JAYB    PURPLE  March   2019    PL    9.90    2018-04-20   yes       
0 JMC     GOLD    March   2019    IN    6000    2020-01-16   no       
0 JMC     GOLD    April   2019    IN    1000    2020-01-16   no      

期望的输出:

数据框 B

  Name       Type   Month   Year  country Amount   Expiration  Paid
0 JONS    GOLD    March   2018    PL    500     2019-10-17   yes
0 ABBY    BRONZE  March   2019    AU    60000   2019-02-02   yes       
0 BUYT     GOLD   March   2018    BR     50     2018-03-22   no       
0 JAYB    PURPLE  March   2019    PL    9.90    2018-04-20   yes       
0 JMC     GOLD    April   2019    IN    1000    2020-01-16   no

【问题讨论】:

    标签: python pandas if-statement conditional


    【解决方案1】:

    我们可以在这里使用merge

    l=['Month', 'Year','Type', 'Name']
    B=B.merge(A[l],on=l,indicator=True,how='outer').loc[lambda x : x['_merge']=='left_only'].copy() 
    # you can add drop here like B=B.drop('_merge',1)
       Name    Type  Month  Year country   Amount  Expiration Paid     _merge
    0  JONS    GOLD  March  2018      PL    500.0  2019-10-17  yes  left_only
    1  ABBY  BRONZE  March  2019      AU  60000.0  2019-02-02  yes  left_only
    2  BUYT    GOLD  March  2018      BR     50.0  2018-03-22   no  left_only
    4  JAYB  PURPLE  March  2019      PL      9.9  2018-04-20  yes  left_only
    6   JMC    GOLD  April  2019      IN   1000.0  2020-01-16   no  left_only
    

    【讨论】:

    • 如果数据框 A 只有月、年、类型和名称列,需要在上面进行哪些更改?谢谢
    【解决方案2】:

    我也尝试过使用MultiIndex

    cols =['Month', 'Year','Type', 'Name']
    index1 = pd.MultiIndex.from_arrays([df1[col] for col in cols])
    index2 = pd.MultiIndex.from_arrays([df2[col] for col in cols])
    df2 = df2.loc[~index2.isin(index1)]
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2013-02-24
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2019-08-20
      相关资源
      最近更新 更多