两列的熊猫百分比答案

【问题标题】：Pandas percentage of two columns两列的熊猫百分比
【发布时间】：2022-11-17 18:46:27
【问题描述】：

我有一个看起来像这样的数据框：

    Vendor  GRDate  Pass/Fail
0   204177  2022-22 1.0
1   204177  2022-22 0.0
2   204177  2022-22 0.0
3   204177  2022-22 1.0
4   204177  2022-22 1.0
5   204177  2022-22 1.0
7   201645  2022-22 0.0
8   201645  2022-22 0.0
9   201645  2022-22 1.0
10  201645  2022-22 1.0

我正在尝试计算每个供应商每周通过/失败等于 1 的百分比，并将其放入新的 df（通过数 = 1 / 每个供应商和每周的总行数）

看起来像这样：

    Vendor  GRDate  Performance
0   204177  2022-22 0.6
1   201645  2022-22 0.5

我正在尝试使用 .groupby() 和 .count() 来执行此操作，但我不知道如何将其与 Vendor 和 GRDate 列一起放入新的 df 中。我这里的代码返回通过失败的百分比，但删除其他两列。

sdp_percent = sdp.groupby(['GRDate','Vendor'])['Pass/Fail'].apply(lambda x: x[x == 1].count()) / sdp.groupby(['GRDate','Vendor'])['Pass/Fail'].count()

但是如果我添加 .reset_index() 来保留它们，我会得到这个错误：unsupported operand type(s) for /: 'str' and 'str'

请有人可以解释我做错了什么吗？

【问题讨论】：

标签： python pandas group-by

【解决方案1】：

尝试：

x = (
    df.groupby(["GRDate", "Vendor"])["Pass/Fail"]
    .mean()
    .reset_index()
    .rename(columns={"Pass/Fail": "Performance"})
)
print(x)

印刷：

    GRDate  Vendor  Performance
0  2022-22  201645     0.500000
1  2022-22  204177     0.666667

【讨论】：

【解决方案2】：

因为你有 0/1，你可以使用 groupby.mean：

(df.groupby(['Vendor', 'GRDate'], as_index=False, sort=False)
   .agg(Performance=('Pass/Fail', 'mean'))
)

如果你有一个特定的任意值X：

(df.assign(val=df['Pass/Fail'].eq(X))
   .groupby(['Vendor', 'GRDate'], as_index=False, sort=False)
   .agg(Performance=('val', 'mean'))
)

输出：

   Vendor   GRDate  Performance
0  204177  2022-22     0.666667
1  201645  2022-22     0.500000

【讨论】：