Pandas - 合并两列答案

【问题标题】：Pandas - combine two columnsPandas - 合并两列
【发布时间】：2019-01-09 03:26:27
【问题描述】：

我有 2 列，我们将它们称为 x 和 y。我想创建一个名为xy 的新列：

x    y    xy
1         1
2         2

     4    4
     8    8

不应该有任何冲突的值，但如果有，y 优先。如果它使解决方案更容易，您可以假设x 将始终为NaN，其中y 具有值。

【问题讨论】：

标签： python pandas dataframe

【解决方案1】：

如果你的例子是准确的，这可能很简单

df.fillna(0)      #if the blanks are nan will need this line first
df['xy']=df['x']+df['y']

【讨论】：

或df.x.combine_first(df.y)
或者也可以。熊猫就像给猫剥皮
太棒了，这行得通。但是，查看 combine_first 示例，如果您希望 y 优先（如果它们都有值），它不应该是 df.y.combine_first(df.x) 吗？
如果空白为空白，请使用您收到的代码TypeError: unsupported operand type(s) for +: 'float' and 'str'
@JesusMonroe 是的......按优先顺序排列......这只是一个例子:)

【解决方案2】：

请注意，您现在的列类型不再是字符串而不是数字

df = df.apply(lambda x : pd.to_numeric(x, errors='coerce'))

df['xy'] = df.sum(1)

df['xy'] =df[['x','y']].astype(str).apply(''.join,1)

#df[['x','y']].astype(str).apply(''.join,1)
Out[655]: 
0    1.0
1    2.0
2       
3    4.0
4    8.0
dtype: object

【讨论】：

这里不需要lambda：可以写成df.apply(pd.to_numeric, errors='coerce')

【解决方案3】：

你也可以使用 NumPy：

import pandas as pd, numpy as np

df = pd.DataFrame({'x': [1, 2, np.nan, np.nan],
                   'y': [np.nan, np.nan, 4, 8]})

arr = df.values
df['xy'] = arr[~np.isnan(arr)].astype(int)

print(df)

     x    y  xy
0  1.0  NaN   1
1  2.0  NaN   2
2  NaN  4.0   4
3  NaN  8.0   8

【讨论】：