Python pandas：在多列中将两个或多个具有相同值的DataFrame的值相加答案

【问题标题】：Python pandas: summing value of two or more DataFrames with identical value in multiple columnsPython pandas：在多列中将两个或多个具有相同值的DataFrame的值相加
【发布时间】：2021-05-11 16:35:14
【问题描述】：

我有两个 DataFrame，比如：

df1 = pd.DataFrame([["tom", 1, 2, 3], ["bob", 3, 4, 5], ["ali", 6, 7, 8]], columns=["name", "A", "B", "C"])
df1
Out[44]: 
  name  A  B  C
0  tom  1  2  3
1  bob  3  4  5
2  ali  6  7  8
df2 = pd.DataFrame([["rob", 1, 2, 3], ["ali", 6, 7, 8]], columns=["name", "A", "B", "D"])
df2
Out[46]: 
  name  A  B  D
0  rob  1  2  3
1  ali  6  7  8

如何对具有相同“名称”和相同列的值执行求和运算，并获得如下结果 DataFrame：

  name A   B   C   D
0  tom 1   2   3 NaN     # <- tom and bob don't shows up in df2, so the sum is identical
1  bob 3   4   5 NaN     #    to their values in df1
2  rob 1   2 NaN   3     # <- rob only shows up on df2, so the sum equal to its df2 values
3  ali 12 14   8   8     # <- ali's A and B are sum up, and C and D are identical to their
                         #    corresponding value in df1 and df2

请注意，我不知道两个 DataFrame 的“名称”列中会显示什么名称。

而且，因为我有两个以上这样的 DataFrame 需要总结，如果可能的话，我怎样才能在一次操作中将所有这些 DataFrame 做到这一点，而不是一一总结？非常感谢。

【问题讨论】：

标签： python pandas dataframe

【解决方案1】：

希望这能解决您的问题。我已将 Nan 修改为 0。

import pandas as pd
df1 = pd.DataFrame([["tom", 1, 2, 3], ["bob", 3, 4, 5], ["ali", 6, 7, 8]], columns=["name", "A", "B", "C"])
df2 = pd.DataFrame([["rob", 1, 2, 3], ["ali", 6, 7, 8]], columns=["name", "A", "B", "D"])
df3=pd.concat([df1, df2], ignore_index=True, sort=False)
df4=df3.groupby(['name'])['A','B','C','D'].sum()
print(df4)

【讨论】：

【解决方案2】：

(1) 如果您的 df1 和 df2 在一个数据帧中不包含重复名称（仍然可以在 2 个数据帧中重复）：

您可以使用.add() 和参数fill_value=0 在我们set_index() 到列name 2 个数据帧上，如下所示：

df3 = (df1.set_index('name')
          .add(df2.set_index('name'), fill_value=0)
      ).reset_index()

NaN 以这种方式保留不匹配条目的值。

通过将索引设置为相同的name 列，Pandas 知道在添加操作之前将行与相同的行索引（相同的名称）对齐。因此，我们可以得到相同名称的正确总和。

结果：

print(df3)

  name     A     B    C    D
0  ali  12.0  14.0  8.0  8.0
1  bob   3.0   4.0  5.0  NaN
2  rob   1.0   2.0  NaN  3.0
3  tom   1.0   2.0  3.0  NaN

(2) 如果您的 df1 和 df2 在一个数据框中包含重复名称：

如果您的 df1 和 df2 在一个数据框中包含重复名称，您可以使用：

df3 = (pd.concat([df1, df2])
         .groupby('name', as_index=False)
         .sum()
      )

结果：

print(df3)

  name   A   B     C    D
0  ali  12  14   8.0  8.0
1  bob   3   4   5.0  0.0
2  rob   1   2   0.0  3.0
3  tom  12  14  16.0  0.0

这样，不匹配条目的NaN 值用0 填充。

【讨论】：