【问题标题】:python pandas check column contains item from a listpython pandas检查列包含列表中的项目
【发布时间】:2018-07-07 22:09:58
【问题描述】:

我有两个像

这样的数据框
vid     vbull   
1125    RHSA:2017:3200   
1127    RHSA:2017:3205  
1128    RHSA:2017:3208   
1129    RHSA:2017:3209


kbid    vdesc   
2401    This contains details for RHSA:2017:3205   
2402    This contains details for RHSA:2017:3206   
2403    This contains details forRHSA:2017:3207   
2404    This contains details for RHSA:2017:3208  
2405    This contains details for RHSA:2017:3200

需要 df1,df2 的输出以匹配 vdesc 中的 vbull,例如:

vid   vbull           kbid   vdesc   
1125  RHSA:2017:3200  2405   This contains details for RHSA:2017:3200   
1127  RHSA:2017:3207  2403  This contains details for RHSA:2017:3207   ...

试过这个来获取匹配的项目,但不确定如何在输出中也获取匹配的项目

df2[df2.vdesc.str.contains('|'.join(df1.vbull))]    

【问题讨论】:

  • vbull 中的值是否唯一?通过print (df1['vbull'].is_unique)检查它

标签: python list pandas merge extract


【解决方案1】:

首先使用extract 获取来自vbull 的值:

df2['extracted'] = df2.vdesc.str.extract('(' + '|'.join(df1.vbull) + ')', expand=False)
print (df2)
   kbid                                     vdesc       extracted
0  2401  This contains details for RHSA:2017:3205  RHSA:2017:3205
1  2402  This contains details for RHSA:2017:3206             NaN
2  2403  This contains details for RHSA:2017:3207             NaN
3  2404  This contains details for RHSA:2017:3208  RHSA:2017:3208
4  2405  This contains details for RHSA:2017:3200  RHSA:2017:3200

然后按boolean indexing过滤:

df3 = df2[df2['extracted'].notnull()].copy()
print (df3)
   kbid                                     vdesc       extracted
0  2401  This contains details for RHSA:2017:3205  RHSA:2017:3205
3  2404  This contains details for RHSA:2017:3208  RHSA:2017:3208
4  2405  This contains details for RHSA:2017:3200  RHSA:2017:3200

最后通过map添加vid的值:

df3['new'] = df3['extracted'].map(df1.set_index('vbull')['vid'])
print (df3)
   kbid                                     vdesc       extracted   new
0  2401  This contains details for RHSA:2017:3205  RHSA:2017:3205  1127
3  2404  This contains details for RHSA:2017:3208  RHSA:2017:3208  1128
4  2405  This contains details for RHSA:2017:3200  RHSA:2017:3200  1125

【讨论】:

  • 谢谢,df1.vbull 中的值要与 df2.vdesc 匹配。此处未检查 df2。
  • 哎呀,分配错误 - 在df2 上找到了df1。对不起:(
猜你喜欢
  • 1970-01-01
  • 1970-01-01
  • 2023-03-26
  • 1970-01-01
  • 1970-01-01
  • 2020-04-02
  • 2012-08-26
  • 1970-01-01
  • 2021-03-22
相关资源
最近更新 更多