【问题标题】:How to compute Spearman correlation in Tensorflow如何在 Tensorflow 中计算 Spearman 相关性
【发布时间】:2018-11-21 01:58:34
【问题描述】:

问题

我需要计算 Pearson 和 Spearman 相关性,并将其用作 tensorflow 中的指标。

对于 Pearson 来说,这很简单:

tf.contrib.metrics.streaming_pearson_correlation(y_pred, y_true)

但对于斯皮尔曼,我一无所知!

我尝试了什么:

来自this answer

    samples = 1
    predictions_rank = tf.nn.top_k(y_pred, k=samples, sorted=True, name='prediction_rank').indices
    real_rank = tf.nn.top_k(y_true, k=samples, sorted=True, name='real_rank').indices
    rank_diffs = predictions_rank - real_rank
    rank_diffs_squared_sum = tf.reduce_sum(rank_diffs * rank_diffs)
    six = tf.constant(6)
    one = tf.constant(1.0)
    numerator = tf.cast(six * rank_diffs_squared_sum, dtype=tf.float32)
    divider = tf.cast(samples * samples * samples - samples, dtype=tf.float32)
    spearman_batch = one - numerator / divider

但是这个返回NaN...


definition of Wikipedia 之后:

我试过了:

size = tf.size(y_pred)
indice_of_ranks_pred = tf.nn.top_k(y_pred, k=size)[1]
indice_of_ranks_label = tf.nn.top_k(y_true, k=size)[1]
rank_pred = tf.nn.top_k(-indice_of_ranks_pred, k=size)[1]
rank_label = tf.nn.top_k(-indice_of_ranks_label, k=size)[1]
rank_pred = tf.to_float(rank_pred)
rank_label = tf.to_float(rank_label)
spearman = tf.contrib.metrics.streaming_pearson_correlation(rank_pred, rank_label)

但是运行这个我得到了以下错误:

tensorflow.python.framework.errors_impl.InvalidArgumentError: 输入 必须至少有 k 列。有 1 个,需要 32 个

[[{{节点指标/spearman/TopKV2}} = TopKV2[T=DT_FLOAT, sorted=true, _device="/job:localhost/replica:0/task:0/device:CPU:0"](lambda_1/add, metrics/pearson/pearson_r/variance_predictions/Size)]]

【问题讨论】:

    标签: python python-3.x tensorflow metrics


    【解决方案1】:

    您可以做的一件事是使用 Tensorflow 的函数 tf.py_functionscipy.stats.spearmanr 并像这样定义输入和输出:

    from scipy.stats import spearmanr
    def get_spearman_rankcor(y_true, y_pred):
         return ( tf.py_function(spearmanr, [tf.cast(y_pred, tf.float32), 
                           tf.cast(y_true, tf.float32)], Tout = tf.float32) )
    

    【讨论】:

      【解决方案2】:

      我一直在努力按照本网站的定义(https://rpubs.com/aaronsc32/spearman-rank-correlation)直接在 tensorflow 中实现 Spearman 等级相关系数,并且我已经达到了以下代码(我分享它以防万一有人发现它有用)。

      @tf.function
      def get_rank(y_pred):
        rank = tf.argsort(tf.argsort(y_pred, axis=-1, direction="ASCENDING"), axis=-1)+1 #+1 to get the rank starting in 1 instead of 0
        return rank
      
      @tf.function
      def sp_rank(x, y):
        cov = tfp.stats.covariance(x, y, sample_axis=0, event_axis=None)
        sd_x = tfp.stats.stddev(x, sample_axis=0, keepdims=False, name=None)
        sd_y = tfp.stats.stddev(y, sample_axis=0, keepdims=False, name=None)
        return 1-cov/(sd_x*sd_y) #1- because we want to minimize loss
      
      @tf.function
      def spearman_correlation(y_true, y_pred):
          #First we obtain the ranking of the predicted values
          y_pred_rank = tf.map_fn(lambda x: get_rank(x), y_pred, dtype=tf.float32)
          
          #Spearman rank correlation between each pair of samples:
          #Sample dim: (1, 8)
          #Batch of samples dim: (None, 8) None=batch_size=64
          #Output dim: (batch_size, ) = (64, )
          sp = tf.map_fn(lambda x: sp_rank(x[0],x[1]), (y_true, y_pred_rank), dtype=tf.float32)
          #Reduce to a single value
          loss = tf.reduce_mean(sp)
          return loss
      

      【讨论】:

      • 非常好!你有没有试过把它变成一个类,以便在训练期间可以用来监控排名相关性?
      【解决方案3】:

      top_k().indices 返回最佳元素的索引。斯皮尔曼需要队伍。它们是不同的。

      例如对于数组 [3, 1, 2]:

      • top_k().indices 返回[1, 2, 0]

      • Spearman 需要 [2, 0, 1]

      您可以通过以下调用获得排名(使用tf.scatter_nd()):

      def my_spearman(y_pred, labels):
        predictions_rank = tf.argsort(tf.squeeze(y_pred))
        real_rank = tf.argsort(labels)
        r = tf.range(tf.shape(labels))
        real_rank = tf.scatter_nd(tf.expand_dims(real_rank, -1), r, tf.shape(real_rank))
        predictions_rank = tf.scatter_nd(tf.expand_dims(predictions_rank, -1), r, tf.shape(predictions_rank))
        rank_diffs = predictions_rank - real_rank
        rank_diffs_squared_sum = tf.reduce_sum(rank_diffs * rank_diffs)
        numerator = tf.cast(6 * rank_diffs_squared_sum, dtype=tf.float32)
        samples = tf.shape(rank_diffs)[0]
        divider = tf.cast(samples * samples * samples - samples, dtype=tf.float32)
        spearman = 1.0 - numerator / divider
        return spearman
      

      请注意,如果元素不是唯一的,则此算法不起作用。相反,应该在排名上计算 Pearson 相关系数:

      def correlationMetric(x, y):
        x = tf.cast(x, tf.float32)
        y = tf.cast(y, tf.float32)
        n = tf.cast(tf.shape(x)[0], x.dtype)
        xsum = tf.reduce_sum(x, axis=0)
        ysum = tf.reduce_sum(y, axis=0)
        xmean = xsum / n
        ymean = ysum / n
        xvar = tf.reduce_sum(tf.math.squared_difference(x, xmean), axis=0)
        yvar = tf.reduce_sum(tf.math.squared_difference(y, ymean), axis=0)
        cov = tf.reduce_sum((x - xmean) * (y - ymean), axis=0)
        corr = cov / tf.sqrt(xvar * yvar)
        return corr
      
      def my_spearman(y_pred, labels):
        predictions_rank = tf.argsort(tf.squeeze(y_pred))
        real_rank = tf.argsort(labels)
        r = tf.range(tf.shape(labels))
        real_rank = tf.scatter_nd(tf.expand_dims(real_rank, -1), r, tf.shape(real_rank))
        predictions_rank = tf.scatter_nd(tf.expand_dims(predictions_rank, -1), r, tf.shape(predictions_rank))
        spearman = correlationMetric(real_rank, predictions_rank)
        return spearman
      

      【讨论】:

        猜你喜欢
        • 1970-01-01
        • 1970-01-01
        • 2018-01-01
        • 1970-01-01
        • 1970-01-01
        • 2018-05-09
        • 1970-01-01
        • 2018-02-01
        • 2020-02-28
        相关资源
        最近更新 更多