【问题标题】:how to concat list of columns into one column in pandas dataframe?如何将列列表连接成熊猫数据框中的一列?
【发布时间】:2021-04-02 06:15:19
【问题描述】:

我有列列表,我需要将它们组合成数据框中的一列。你能帮我怎么做吗?

例子:

example-1 column_list = ['File Type', 'Number of Records']
          df['pk'] = df['File Type'] + df['Number of Records']


example-2 column_list = ['File Type', 'Number of Records', 'Indication']
          df['pk'] = df['File Type'] + df['Number of Records'] + df['Indication']


example-3 column_list = ['File Type']
          df['pk'] = df['File Type']

【问题讨论】:

    标签: python python-3.x pandas dataframe csv


    【解决方案1】:

    首先通过column_list 选择列,然后如果至少有一个非字符串列并最后加入DataFrame.agg,则通过DataFrame.astype 将值转换为字符串:

    df['pk'] = df[column_list].astype(str).agg(''.join, axis=1)
    

    或者在DataFrame.apply:

    df['pk'] = df[column_list].astype(str).apply(''.join, axis=1)
    

    示例

    df = pd.DataFrame({'File Type':['aa','bb'], 
                       'Number of Records':[1,5],
                       'Indication':['ind1','ind2']})
    
    column_list1 = ['File Type', 'Number of Records']
    column_list2 = ['File Type', 'Number of Records', 'Indication']
    column_list3 = ['File Type']
    df['pk1'] = df[column_list1].astype(str).agg(''.join, axis=1)
    df['pk2'] = df[column_list2].astype(str).agg(''.join, axis=1)
    df['pk3'] = df[column_list3].astype(str).agg(''.join, axis=1)
    print (df)
      File Type  Number of Records Indication  pk1      pk2 pk3
    0        aa                  1       ind1  aa1  aa1ind1  aa
    1        bb                  5       ind2  bb5  bb5ind2  bb
    

    另一个想法是使用sum:

    df['pk1'] = df[column_list1].astype(str).sum(axis=1)
    df['pk2'] = df[column_list2].astype(str).sum(axis=1)
    df['pk3'] = df[column_list3].astype(str).sum(axis=1)
    print (df)
      File Type  Number of Records Indication  pk1      pk2 pk3
    0        aa                  1       ind1  aa1  aa1ind1  aa
    1        bb                  5       ind2  bb5  bb5ind2  bb
    

    sum 解决方案的问题是如果加入数字列,则将输出转换为浮点数:

    df = pd.DataFrame({'File Type':[4,5], 'Number of Records':[1,5], 'Indication':[8,9]})
    
    column_list1 = ['File Type', 'Number of Records']
    column_list2 = ['File Type', 'Number of Records', 'Indication']
    column_list3 = ['File Type']
    df['pk1'] = df[column_list1].astype(str).sum(axis=1)
    df['pk2'] = df[column_list2].astype(str).sum(axis=1)
    df['pk3'] = df[column_list3].astype(str).sum(axis=1)
    print (df)
       File Type  Number of Records  Indication   pk1    pk2  pk3
    0          4                  1           8  41.0  418.0  4.0
    1          5                  5           9  55.0  559.0  5.0
    
    print (df.dtypes)
    File Type              int64
    Number of Records      int64
    Indication             int64
    pk1                  float64
    pk2                  float64
    pk3                  float64
    dtype: object
    

    【讨论】:

      猜你喜欢
      • 2015-12-03
      • 1970-01-01
      • 1970-01-01
      • 2023-01-23
      • 2018-10-09
      • 2020-10-13
      • 2020-06-19
      • 2022-11-03
      相关资源
      最近更新 更多