如果 Dataframe 中存在列，则添加列：pandas答案

【问题标题】：Adding Columns if they exist in the Dataframe: pandas如果 Dataframe 中存在列，则添加列：pandas
【发布时间】：2016-08-01 14:00:32
【问题描述】：

我有一个数据框，我想在其中添加三列并使其成为一列，并且我希望它用于数据框中退出的列。例如，我想添加以下列

List_ID=['ID1','ID2','ID3'] 
df # Df is the data frame

我正在尝试总结“如果存在”列，但无法。

ID=sum[[col for col in list_ID if col in df.columns]]

并解释 ID1、ID2 列它们可以像

df = pd.DataFrame({'Column 1':['A', '', 'C', ' '],'Column 2':[' ', 'F', ' ', '']})  
and my new column ID will look like 

In[34]: a=df['Column 1'] + (df['Column 2'])

In[35]: a

   Out[35]: 
    0    A 
    1    F
    2    C 
    3

欢迎大家提出建议

【问题讨论】：

标签： python pandas if-statement for-loop dataframe

【解决方案1】：

IIUC 一种可能的解决方案是strip wtihespaces：

a=df['Column 1'].str.strip() + df['Column 2'].str.strip()
print (a)
0    A
1    F
2    C
3     
dtype: object

更通用的解决方案是先过滤列名：

import pandas as pd

df = pd.DataFrame({'ID1':['    A', '', 'C', ' '],
                   'ID2':[' ', 'F', ' ', ''], 
                   'ID5':['T', 'E', ' ', '']}) 
print (df)
     ID1 ID2 ID5
0      A       T
1          F   E
2      C        
3      

List_ID=['ID1','ID2','ID3'] 
cols = df.columns[df.columns.isin(List_ID)]
print (cols)
Index(['ID1', 'ID2'], dtype='object')

#there are whitespaces
print (df[cols].sum(axis=1))
0        A 
1         F
2        C 
3          
dtype: object

然后您需要为具有列表理解的每一列应用函数 strip，concat 输出列表和最后一个 sum 按列 (axis=1)

print (pd.concat([df[c].str.strip() for c in df[cols]], axis=1).sum(axis=1))
0    A
1    F
2    C
3

通过评论编辑：

import pandas as pd

df = pd.DataFrame({'ID1':[15.3, 12.1, 13.2, 10.0],
                   'ID2':[7.0, 7.7, 2, 11.3], 
                   'ID5':[10, 15, 3.1, 2.2]}) 

print (df)
    ID1   ID2   ID5
0  15.3   7.0  10.0
1  12.1   7.7  15.0
2  13.2   2.0   3.1
3  10.0  11.3   2.2

List_ID=['ID1','ID2','ID3']
cols = df.columns[df.columns.isin(List_ID)]
print (cols)
Index(['ID1', 'ID2'], dtype='object')

#summed floats
print (df[cols].sum(axis=1))
0    22.3
1    19.8
2    15.2
3    21.3
dtype: float64

#cast float to string and sum
print (df[cols].astype(str).sum(axis=1))
0     15.37.0
1     12.17.7
2     13.22.0
3    10.011.3
dtype: object

#cast float to int, then to str, sum, then removed float 0 by cast to int and last to str
print (df[cols].astype(int).astype(str).sum(axis=1).astype(int).astype(str))
0     157
1     127
2     132
3    1011
dtype: object

#cast float to int, then to str and concanecate by join
print (df[cols].astype(int).astype(str).apply(lambda x: ''.join(x), axis=1))
0     157
1     127
2     132
3    1011
dtype: object

【讨论】：

感谢 jezrael，对于我的特殊问题，总和不起作用，它添加了将它们视为浮点数的值。如何将过滤后的列转换为类型（str）或将它们带入''然后求和
我觉得你可以用(pd.concat([df[c].str.strip() for c in df[cols].astype(str)], axis=1).sum(axis=1))
最后两个解决方案都显示了这个错误，真的很接近，只是缺少一些东西“raise AttributeError("Can only use .str accessor with string" AttributeError: Can only use .str accessor with string values, which在 pandas 中使用 np.object_ dtype"
我添加了另一个样本和一些总和示例。请检查一下。
非常感谢 Jezrael，真的很有帮助