首先通过column_list 选择列,然后如果至少有一个非字符串列并最后加入DataFrame.agg,则通过DataFrame.astype 将值转换为字符串:
df['pk'] = df[column_list].astype(str).agg(''.join, axis=1)
或者在DataFrame.apply:
df['pk'] = df[column_list].astype(str).apply(''.join, axis=1)
示例:
df = pd.DataFrame({'File Type':['aa','bb'],
'Number of Records':[1,5],
'Indication':['ind1','ind2']})
column_list1 = ['File Type', 'Number of Records']
column_list2 = ['File Type', 'Number of Records', 'Indication']
column_list3 = ['File Type']
df['pk1'] = df[column_list1].astype(str).agg(''.join, axis=1)
df['pk2'] = df[column_list2].astype(str).agg(''.join, axis=1)
df['pk3'] = df[column_list3].astype(str).agg(''.join, axis=1)
print (df)
File Type Number of Records Indication pk1 pk2 pk3
0 aa 1 ind1 aa1 aa1ind1 aa
1 bb 5 ind2 bb5 bb5ind2 bb
另一个想法是使用sum:
df['pk1'] = df[column_list1].astype(str).sum(axis=1)
df['pk2'] = df[column_list2].astype(str).sum(axis=1)
df['pk3'] = df[column_list3].astype(str).sum(axis=1)
print (df)
File Type Number of Records Indication pk1 pk2 pk3
0 aa 1 ind1 aa1 aa1ind1 aa
1 bb 5 ind2 bb5 bb5ind2 bb
sum 解决方案的问题是如果加入数字列,则将输出转换为浮点数:
df = pd.DataFrame({'File Type':[4,5], 'Number of Records':[1,5], 'Indication':[8,9]})
column_list1 = ['File Type', 'Number of Records']
column_list2 = ['File Type', 'Number of Records', 'Indication']
column_list3 = ['File Type']
df['pk1'] = df[column_list1].astype(str).sum(axis=1)
df['pk2'] = df[column_list2].astype(str).sum(axis=1)
df['pk3'] = df[column_list3].astype(str).sum(axis=1)
print (df)
File Type Number of Records Indication pk1 pk2 pk3
0 4 1 8 41.0 418.0 4.0
1 5 5 9 55.0 559.0 5.0
print (df.dtypes)
File Type int64
Number of Records int64
Indication int64
pk1 float64
pk2 float64
pk3 float64
dtype: object