【问题标题】:How to calculate Imbalance Accuracy Metric in multi-class classification如何计算多类分类中的不平衡准确度指标
【发布时间】:2021-09-03 04:52:49
【问题描述】:

很抱歉打扰,但我发现了一篇有趣的文章“Mortaz, E. (2020)。多类不平衡分类问题中模型选择的不平衡准确度度量。基于知识的系统, 210, 106490” (@987654321 @) 并且他们在那里计算这个度量(IAM)(公式在论文中,我理解了),但我想问:我怎样才能在 R 上复制它?

对于这个愚蠢的问题,我提前道歉。感谢您的关注!

【问题讨论】:

    标签: classification metrics multilabel-classification multiclass-classification


    【解决方案1】:

    文中提供的IAM公式为:IAM formula

    其中 cij 是分类器混淆矩阵 (c) 中的元素 (i,j)。 k是指分类中的类数(k>=2)。结果表明,该度量可以作为多类模型选择的单独度量。

    下面提供的python实现IAM(不平衡准确度指标)的代码:

    def IAM(c):
      '''
      c is a nested list presenting the confusion matrix of the classifier (len(c)>=2)
      '''
      l  = len(c)
      iam = 0
    
      for i in range(l):
          sum_row = 0
          sum_col = 0
          sum_row_no_i = 0
          sum_col_no_i = 0
          for j in range(l):
              sum_row += c[i][j]
              sum_col += c[j][i]
              if j is not i:
                  sum_row_no_i += c[i][j] 
                  sum_col_no_i += c[j][i]
          iam += (c[i][i] - max(sum_row_no_i, sum_col_no_i))/max(sum_row, sum_col)
      return   iam/l
    
    c = [[2129,   52,    0,    1],
         [499,   70,    0,    2],
         [46,   16,    0,   1],
         [85,   18,    0,   7]]
    
    IAM(c) = -0.5210576475801445
    

    在 R 中实现 IAM(不平衡准确度指标)的代码如下:

    IAM <- function(c) {
    
     # c is a matrix representing the confusion matrix of the classifier.
    
      l <- nrow(c)
      result = 0
      
      for (i in 1:l) {
      sum_row = 0
      sum_col = 0
      sum_row_no_i = 0
      sum_col_no_i = 0
    
        for (j in 1:l){
              sum_row = sum_row + c[i,j]
              sum_col = sum_col + c[j,i]
              if(i != j)  {
                  sum_row_no_i = sum_row_no_i + c[i,j] 
                  sum_col_no_i = sum_col_no_i + c[j,i]
              }
        }
        result = result + (c[i,i] - max(sum_row_no_i, sum_col_no_i))/max(sum_row, sum_col)
      }
      return(result/l)
    }
    
    c <- matrix(c(2129,52,0,1,499,70,0,2,46,16,0,1,85,18,0,7), nrow=4, ncol=4)
    
    IAM(c) = -0.5210576475801445
    

    另一个来自鸢尾花数据集(3类问题)和sklearn的例子:

    from sklearn.datasets import load_iris
    from sklearn.linear_model import LogisticRegression
    from sklearn.metrics import confusion_matrix
    
    X, y = load_iris(return_X_y=True)
    clf = LogisticRegression(max_iter = 1000).fit(X, y)
    pred = clf.predict(X)
    c = confusion_matrix(y, pred)
    print('confusion matrix:')
    print(c)
    print(f'accuarcy : {clf.score(X, y)}')
    print(f'IAM : {IAM(c)}')
    
    confusion matrix:
    [[50  0  0]
     [ 0 47  3]
     [ 0  1 49]]
    accuarcy : 0.97
    IAM : 0.92
    

    【讨论】:

    • 正如目前所写,您的答案尚不清楚。请edit 添加其他详细信息,以帮助其他人了解这如何解决所提出的问题。你可以找到更多关于如何写好答案的信息in the help center
    猜你喜欢
    • 1970-01-01
    • 2016-01-12
    • 2018-11-14
    • 2019-01-24
    • 2018-09-22
    • 2019-03-23
    • 2020-01-03
    • 1970-01-01
    • 2013-12-26
    相关资源
    最近更新 更多