【问题标题】:Create a column in pandas dataframe by splitting the values通过拆分值在 pandas 数据框中创建一列
【发布时间】:2020-08-07 02:49:57
【问题描述】:

我有一个如下的熊猫数据框:

import pandas as pd
import numpy as np
df = pd.DataFrame({'col1':['AA_L8_ZZ', 'AA_L8_YY', 'AA_L80_XX', 'AA_L8_CC'], 'col2':['AAA_L8_1D', 'AA_L8_2D', 'AA_L80_5C', 'AA_L8_6Y']})
df

    col1        col2
0   AA_L8_ZZ    AAA_L8_1D
1   AA_L8_YY    AA_L8_2D
2   AA_L80_XX   AA_L80_5C
3   AA_L8_CC    AA_L8_6Y

我想创建一个列作为 col3

col3 = ('col1' 被 _ 分割后的前 2 个实例) + _ + ('col2' 被 _ 分割后的第 3 个实例)

我的预期输出:

    col1        col2        col3
0   AA_L8_ZZ    AAA_L8_1D   AA_L8_1D
1   AA_L8_YY    AA_L8_2D    AA_L8_2D
2   AA_L80_XX   AA_L80_5C   AA_L80_5C
3   AA_L8_CC    AA_L8_6Y    AA_L8_6Y

【问题讨论】:

    标签: python-3.x pandas


    【解决方案1】:

    让我们尝试一些正则表达式:

    df['col3'] = df['col1'].str.extract('^(.*_.*_)').add(df['col2'].str.extract('^.*_.*_([^_]*)'))[0]
    

    输出:

            col1       col2       col3
    0   AA_L8_ZZ  AAA_L8_1D   AA_L8_1D
    1   AA_L8_YY   AA_L8_2D   AA_L8_2D
    2  AA_L80_XX  AA_L80_5C  AA_L80_5C
    3   AA_L8_CC   AA_L8_6Y   AA_L8_6Y
    

    【讨论】:

      【解决方案2】:

      您可以像这样使用 str 访问器方法:

      df['col3'] = (df['col1'].str.rsplit('_', n=1).str[0]
                              .str.cat(df['col2'].str.rsplit('_', n=1).str[-1], 
                                       sep='_'))
      df
      

      输出:

              col1       col2       col3
      0   AA_L8_ZZ  AAA_L8_1D   AA_L8_1D
      1   AA_L8_YY   AA_L8_2D   AA_L8_2D
      2  AA_L80_XX  AA_L80_5C  AA_L80_5C
      3   AA_L8_CC   AA_L8_6Y   AA_L8_6Y
      

      rsplit 从末尾(右)开始拆分,n 参数是限制拆分的次数。 .str[n] 是拆分后生成的列表的索引,cat 是将字符串与sep='_' 连接在一起。

      【讨论】:

        【解决方案3】:
        import pandas as pd
        import numpy as np
        df = pd.DataFrame({'col1':['AA_L8_ZZ', 'AA_L8_YY', 'AA_L80_XX', 'AA_L8_CC'], 'col2':['AAA_L8_1D', 'AA_L8_2D', 'AA_L80_5C', 'AA_L8_6Y']})
        
        #defining a list to store the contents for col3
        a = []
        
        #extracting the values by first changing the elements of both columns into string and then joining the extracted values and inserting into the list 
        for i,j in zip(df.col1, df.col2):
            a.append(str(i).split('_')[0]+"_"+str(i).split('_')[1]+"_"+str(j).split('_')[2])
        
        #defining new column and assigning the value to it
        df['col3'] =  a
        
        print(df)
        

        【讨论】:

          猜你喜欢
          • 1970-01-01
          • 2020-10-01
          • 2019-09-20
          • 2019-04-04
          • 2019-12-20
          • 1970-01-01
          • 1970-01-01
          • 2023-01-16
          • 2020-07-05
          相关资源
          最近更新 更多