【问题标题】:Bin data into ranges将数据分箱到范围内
【发布时间】:2018-11-07 19:34:26
【问题描述】:

我有一个如下所示的数据框,我想创建 4 列来计算准确度分布

Company Error_Rate
A       9
B      10
c      20
GK     17
GK     18
GK     30
GK     35
GK     25
GK     32
GK     40
GK     50
MB     60
MB     70
MB     70

我希望有这样一张桌子

Company Error_Rate  Above 90%   80% - 90%   65% - 80%   Below 65%
A              9    1           0           0           0
B             10    1           0           0           0
c             20    0           1           0           0
GK            17    0           1           0           0
GK            18    0           1           0           0
GK            30    0           0           1           0
GK            35    0           0           1           0
GK            40    0           0           0           1

我试过了

df['Above 90%'] = np.where(df['Error_Rate']<=10,1,0)
df['80% - 90%'] = np.where(df['Error_Rate'] <= 20,(np.where(df['Error_Rate'] > 10, 1, 0)),0)
df['65% - 80%'] = np.where(df['Error_Rate'] <= 35,(np.where(df['Error_Rate'] > 20, 1, 0)),0)
df['Below 65%'] = np.where(df['Error_Rate']>35,1,0)

它没有给我想要的结果。我是不是哪里出错了?

【问题讨论】:

    标签: python pandas dataframe conditional


    【解决方案1】:

    如果您必须编写 4 个np.where 条件来计算一列,那么您做错了。我认为考虑不同的方法是明智的。

    一个简洁的选项涉及pd.cut + pd.get_dummies

    bins = [0, 65, 80, 90, 100]
    labels = ['Below 65%', '65% - 80%', '80% - 90%', 'Above 90%']
    
    pd.concat([
        df, pd.get_dummies(pd.cut(100 - df.Error_Rate, bins=bins, labels=labels, right=True))
       ], axis=1
    )
    
       Company  Error_Rate  Below 65%  65% - 80%  80% - 90%  Above 90%
    0        A           9          0          0          0          1
    1        B          10          0          0          0          1
    2        c          20          0          0          1          0
    3       GK          17          0          0          1          0
    4       GK          18          0          0          1          0
    5       GK          30          0          1          0          0
    6       GK          35          0          1          0          0
    7       GK          25          0          1          0          0
    8       GK          32          0          1          0          0
    9       GK          40          1          0          0          0
    10      GK          50          1          0          0          0
    11      MB          60          1          0          0          0
    12      MB          70          1          0          0          0
    13      MB          70          1          0          0          0
    

    【讨论】:

      【解决方案2】:

      用途:

      df['Above 90%'] = np.where(df['Error_Rate']<=10,1,0)
      df['80% - 90%'] = np.where((df['Error_Rate'] <= 20) & (df['Error_Rate'] > 10),1,0)
      df['65% - 80%'] = np.where((df['Error_Rate'] <= 35) & (df['Error_Rate'] > 20),1,0)
      df['Below 65%'] = np.where(df['Error_Rate']>35,1,0)
      
      print (df)
         Company  Error_Rate  Above 90%  80% - 90%  65% - 80%  Below 65%
      0        A           9          1          0          0          0
      1        B          10          1          0          0          0
      2        c          20          0          1          0          0
      3       GK          17          0          1          0          0
      4       GK          18          0          1          0          0
      5       GK          30          0          0          1          0
      6       GK          35          0          0          1          0
      7       GK          25          0          0          1          0
      8       GK          32          0          0          1          0
      9       GK          40          0          0          0          1
      10      GK          50          0          0          0          1
      11      MB          60          0          0          0          1
      12      MB          70          0          0          0          1
      13      MB          70          0          0          0          1
      

      【讨论】:

        猜你喜欢
        • 2013-06-19
        • 1970-01-01
        • 1970-01-01
        • 2015-08-28
        • 2018-11-29
        • 2021-08-29
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        相关资源
        最近更新 更多