【问题标题】:Grouping and counting to get ratios in pandas分组和计数以获得熊猫中的比率
【发布时间】:2018-02-15 17:23:51
【问题描述】:

这是 中询问的数据帧上的另一个great question,它将受益于 解决方案。问题来了。

我想按country 计算status 的次数 openstatusclosed 的次数。然后 根据country 计算closerate

数据:

  customer country   closeday status
1        1      BE 2017-08-23 closed
2        2      NL 2017-08-05   open
3        3      NL 2017-08-22 closed
4        4      NL 2017-08-26 closed
5        5      BE 2017-08-25 closed
6        6      NL 2017-08-13   open
7        7      BE 2017-08-30 closed
8        8      BE 2017-08-05   open
9        9      NL 2017-08-23 closed

这个想法是得到一个描述open的数量的输出和 closed 状态和close_ratio。这是所需的输出:

country   closed  open  closed_ratio                         
BE            3     1          0.75
NL            3     2          0.60

期待您的建议。

答案中包含以下解决方案。欢迎其他解决方案。

【问题讨论】:

    标签: r pandas python pandas dataframe group-by


    【解决方案1】:

    这里有一些方法

    1)

    In [420]: (df.groupby(['country', 'status']).size().unstack()
                 .assign(closed_ratio=lambda x: x.closed / x.sum(1)))
    Out[420]:
    status   closed  open  closed_ratio
    country
    BE            3     1          0.75
    NL            3     2          0.60
    

    2)

    In [422]: (pd.crosstab(df.country, df.status)
                 .assign(closed_ratio=lambda x: x.closed/x.sum(1)))
    Out[422]:
    status   closed  open  closed_ratio
    country
    BE            3     1          0.75
    NL            3     2          0.60
    

    3)

    In [424]: (df.pivot_table(index='country', columns='status', aggfunc='size')
                 .assign(closed_ratio=lambda x: x.closed/x.sum(1)))
    Out[424]:
    status   closed  open  closed_ratio
    country
    BE            3     1          0.75
    NL            3     2          0.60
    

    4) 借自 piRSquared

    In [430]: (df.set_index('country').status.str.get_dummies().sum(level=0)
                 .assign(closed_ratio=lambda x: x.closed/x.sum(1)))
    Out[430]:
             closed  open  closed_ratio
    country
    BE            3     1          0.75
    NL            3     2          0.60
    

    【讨论】:

      【解决方案2】:
      df
      
         customer country    closeday  status
      1         1      BE  2017-08-23  closed
      2         2      NL  2017-08-05    open
      3         3      NL  2017-08-22  closed
      4         4      NL  2017-08-26  closed
      5         5      BE  2017-08-25  closed
      6         6      NL  2017-08-13    open
      7         7      BE  2017-08-30  closed
      8         8      BE  2017-08-05    open
      9         9      NL  2017-08-23  closed
      

      应用groupby,并用size计算每个组,然后unstack第一级。

      df2 = df.groupby(['country', 'status']).status.size().unstack(level=1)
      df2
      
      status   closed  open
      country              
      BE            3     1
      NL            3     2
      

      现在,计算closed_ratio

      df2['closed_ratio'] = df2.closed / df2.sum(1)     
      df2
      
      status   closed  open  closed_ratio
      country                            
      BE            3     1          0.75
      NL            3     2          0.60
      

      【讨论】:

        猜你喜欢
        • 1970-01-01
        • 1970-01-01
        • 2019-03-06
        • 1970-01-01
        • 2018-04-29
        • 2017-03-21
        • 2020-09-21
        相关资源
        最近更新 更多