【问题标题】:Create a column based on if a string is a substring in pandas Dataframe根据字符串是否为 pandas Dataframe 中的子字符串创建列
【发布时间】:2018-12-02 10:41:22
【问题描述】:

我的数据框中的一列是具有特定命名约定的标识符名称。输入时,输入不正确。我想问一下如何在python中找到特定的关键字输入到它自己的列中。也许是某种循环?

例子:

types = ['XYZ', 'OPQ', 'MNO', 'ABC']

当前 df:

ID  ID Name
45  I_name_ls_XYZ_random
46  I_22_name_ABC_random
47  I_name_ls_XYZ_random_45
48  I_name_ls_MNO_random
49  I_ls_OPQ_random_name
50  I_name_ls_ABC_random
51  I_name_ls_XYZ_random
52  I_name_MNO_random

想要的结果:

ID  ID Name                types
45  I_name_ls_XYZ_random    XYZ
46  I_22_name_ABC_random    ABC
47  I_name_ls_XYZ_random_45 XYZ
48  I_name_ls_MNO_random    MNO
49  I_ls_OPQ_random_name    OPQ
50  I_name_ls_ABC_random    ABC
51  I_name_ls_XYZ_random    XYZ
52  I_name_MNO_random       MNO

【问题讨论】:

    标签: python string pandas dataframe series


    【解决方案1】:

    使用 str.extract

    df['types'] = df.Name.str.extract('({})'.format('|'.join(types)))
    
       ID                     Name types
    0  45     I_name_ls_XYZ_random   XYZ
    1  46     I_22_name_ABC_random   ABC
    2  47  I_name_ls_XYZ_random_45   XYZ
    3  48     I_name_ls_MNO_random   MNO
    4  49     I_ls_OPQ_random_name   OPQ
    5  50     I_name_ls_ABC_random   ABC
    6  51     I_name_ls_XYZ_random   XYZ
    7  52        I_name_MNO_random   MNO
    

    如果您可能需要多个匹配项,可以使用 findall

    df
       ID                     Name
    0  45  I_name_ls_XYZ_ABCrandom
    
    df.Name.str.findall(r'|'.join(types))
    0    [XYZ, ABC]
    Name: Name, dtype: object
    

    【讨论】:

      【解决方案2】:

      pd.Series.apply 与自定义函数/生成器表达式一起使用:

      types = {'XYZ', 'OPQ', 'MNO', 'ABC'}
      
      def string_filter(x):
          return next((i for i in x.split('_') if i in types), None)
      
      df['types'] = df['ID_Name'].apply(string_filter)
      
      print(df)
      
         ID                  ID_Name types
      0  45     I_name_ls_XYZ_random   XYZ
      1  46     I_22_name_ABC_random   ABC
      2  47  I_name_ls_XYZ_random_45   XYZ
      3  48     I_name_ls_MNO_random   MNO
      4  49     I_ls_OPQ_random_name   OPQ
      5  50     I_name_ls_ABC_random   ABC
      6  51     I_name_ls_XYZ_random   XYZ
      7  52        I_name_MNO_random   MNO
      

      【讨论】:

        猜你喜欢
        • 2020-12-23
        • 2017-10-31
        • 2013-08-01
        • 2019-10-03
        • 2021-12-28
        相关资源
        最近更新 更多