从另一个数据帧中减去一个 Pandas 数据帧中的属性值答案

【问题标题】：Subtracting values of attributes within one Pandas Dataframe from another dataframe从另一个数据帧中减去一个 Pandas 数据帧中的属性值
【发布时间】：2018-07-29 06:36:00
【问题描述】：

这个问题包含 3 个单独的数据框。 df1 代表产品 1,2,3 的 'Total'，包含 'value1', 'value2' df2代表产品1、2、3的'Customer1'，包含'value1'、'value2' df3代表产品1、2、3的'Customer2'，包含'value1'、'value2'

df2 & df3 本质上是 df1 的子集。

我想创建另一个数据框，从 df1 中减去 df2&df3 并标记这个 df4。我希望 df4 成为“市场”列中的“剩余客户”。

这是我到目前为止所做的

import pandas as pd


d1 = {'Market': ['Total', 'Total','Total'], 'Product Code': [1, 2, 3], 
'Value1':[10, 20, 30], 'Value2':[5, 15, 25]}
df1 = pd.DataFrame(data=d1)
df1



d2 = {'Market': ['Customer1', 'Customer1','Customer1'], 'Product Code': [1, 
2, 3], 'Value1':[3, 14, 10], 'Value2':[2, 4, 6]}
df2 = pd.DataFrame(data=d2)
df2


d3 = {'Market': ['Customer2', 'Customer2','Customer2'], 'Product Code': [1, 
2, 3], 'Value1':[3, 3, 4], 'Value2':[2, 6, 10]}
df3 = pd.DataFrame(data=d3)
df3

这会产生以下结果..

Market  Product Code  Value1  Value2
0  Total             1      10       5
1  Total             2      20      15
2  Total             3      30      25
  Market  Product Code  Value1  Value2
0  Customer1             1       3       2
1  Customer1             2      14       4
2  Customer1             3      10       6
  Market  Product Code  Value1  Value2
0  Customer2             1       3       2
1  Customer2             2       3       6
2  Customer2             3       4      10

要创建 df4，我尝试使用以下代码并收到错误“TypeError: unsupported operand type(s) for -: 'str' and 'str'' 有人可以帮忙吗？

df4 = df1-(df2+df3)

print(df4)

【问题讨论】：

标签： python python-3.x pandas

【解决方案1】：

这是一种方法：

cols = ['Value1', 'Value2']
df4 = df1[cols].subtract(df2[cols].add(df3[cols]))\
               .assign(**{'Market': 'RemainingCustomers', 'Product Code': [1, 2, 3]})\
               .sort_index(axis=1)

#                Market  Product Code  Value1  Value2
# 0  RemainingCustomers             1       4       1
# 1  RemainingCustomers             2       3       5
# 2  RemainingCustomers             3      16       9

说明

df1[cols].subtract(df2[cols].add(df3[cols])) 仅对指定列执行计算。
assign(**{'Market': 'RemainingCustomers', 'Product Code': [1, 2, 3]}) 添加了结果数据框所需的额外列。
sort_index(axis=1) 对列重新排序以获得所需的输出。

【讨论】：

完美运行。谢谢

【解决方案2】：

删除Market，设置Product Code为索引，对产品代码进行索引对齐运算。之后，只需重置索引并将Market 插入结果中即可。

df1, df2, df3 = [
      df.drop('Market', 1).set_index('Product Code') for df in [df1, df2, df3]
]

df4 = (df1 - (df2 + df3)).reset_index()
df4.insert(0, 'Market', 'RemainingCustomers')

               Market  Product Code  Value1  Value2
0  RemainingCustomers             1       4       1
1  RemainingCustomers             2       3       5
2  RemainingCustomers             3      16       9

【讨论】：

【解决方案3】：

不完全符合 OP 的要求，但在我看来，这可能是管理数据的更好方法。

df = pd.concat([df1, df2, df3]).set_index(['Product Code', 'Market'])

formula = 'RemainingCustomers = Total - Customer1 - Customer2'
df = df.unstack().stack(0).eval(formula).unstack()
df

Market       Customer1        Customer2         Total        RemainingCustomers       
                Value1 Value2    Value1 Value2 Value1 Value2             Value1 Value2
Product Code                                                                          
1                    3      2         3      2     10      5                  4      1
2                   14      4         3      6     20     15                  3      5
3                   10      6         4     10     30     25                 16      9

和

df['RemainingCustomers']

              Value1  Value2
Product Code                
1                  4       1
2                  3       5
3                 16       9

如果我们坚持要求的输出

df.stack(0).reset_index().query(
    'Market == "RemainingCustomers"').reindex(columns=df1.columns)

                Market  Product Code  Value1  Value2
2   RemainingCustomers             1       4       1
6   RemainingCustomers             2       3       5
10  RemainingCustomers             3      16       9

或者

df.stack(0).xs(
    'RemainingCustomers', level=1, drop_level=False
).reset_index().reindex(columns=df1.columns)

               Market  Product Code  Value1  Value2
0  RemainingCustomers             1       4       1
1  RemainingCustomers             2       3       5
2  RemainingCustomers             3      16       9

【讨论】：

【解决方案4】：

也许我们可以使用select_dtypes

(df1.select_dtypes(exclude = 'object')
     -df2.select_dtypes(exclude = 'object')
       -df3.select_dtypes(exclude = 'object')).\
            drop('Product Code',1).\
              combine_first(df1).\
               assign(Market='remaining customers')
Out[133]: 
                Market  Product Code  Value1  Value2
0  remaining customers           1.0       4       1
1  remaining customers           2.0       3       5
2  remaining customers           3.0      16       9

【讨论】：

这绝对有效，但你应该将你的答案分成多行:)