在张量流中计算基尼指数答案

【问题标题】：Computing Gini index in tensorflow在张量流中计算基尼指数
【发布时间】：2018-03-21 07:57:20
【问题描述】：

我正在尝试将 gini 指数计算写为 tensorflow 成本函数。基尼指数为： https://en.wikipedia.org/wiki/Gini_coefficient

一个 numpy 解决方案是

def ginic(actual, pred):
    n = len(actual)
    a_s = actual[np.argsort(pred)]
    a_c = a_s.cumsum()
    giniSum = a_c.sum() / a_s.sum() - (n + 1) / 2.0
    return giniSum / n

有人可以帮我弄清楚如何在 tf 中执行此操作（例如，在 tf 中没有 argsort 可以成为微分函数的一部分，AFAIK）

【问题讨论】：

有一个可微排序运算符的实现，最近的研究结果也可以在 TF github.com/google-research/fast-soft-sort

标签： python-3.x numpy tensorflow gini

【解决方案1】：

您可以使用tf.nn.top_k() 执行argsorting。这个函数返回一个元组，第二个元素是索引。它的顺序必须颠倒，因为顺序是降序的。

def ginicTF(actual:tf.Tensor,pred:tf.Tensor):
    n = int(actual.get_shape()[-1])
    inds =  tf.reverse(tf.nn.top_k(pred,n)[1],axis=[0]) # this is the equivalent of np.argsort
    a_s = tf.gather(actual,inds) # this is the equivalent of numpy indexing
    a_c = tf.cumsum(a_s)
    giniSum = tf.reduce_sum(a_c)/tf.reduce_sum(a_s) - (n+1)/2.0
    return giniSum / n

您可以使用以下代码验证此函数是否返回与您的 numpy 函数 ginic 相同的数值：

sess = tf.InteractiveSession()
ac = tf.placeholder(shape=(50,),dtype=tf.float32)
pr = tf.placeholder(shape=(50,),dtype=tf.float32)
actual  = np.random.normal(size=(50,))
pred  = np.random.normal(size=(50,))
print('numpy version: {:.4f}'.format(ginic(actual,pred)))
print('tensorflow version: {:.4f}'.format(ginicTF(ac,pr).eval(feed_dict={ac:actual,pr:pred})))

【讨论】：

好的，这看起来不错，但是当作为损失函数传递给 NN 时，它会返回以下行的错误：---> 14 n = int(actual.get_shape()[-1]) 错误: TypeError: _int_ returned non-int (type NoneType) 如果我只是运行一个会话，它会按预期工作
我认为这是因为actual 的占位符\张量的形状是(None,)，这意味着它没有预定义的长度，因此无法计算n在图构建时。在这种情况下，您可以做的只是将n（数组的长度）作为附加参数传递给函数，而不是计算它。
好的，我无法解决这个问题（尝试给出默认值 n，但这并不能解决它）。我针对这个特殊问题提出了一个新问题stackoverflow.com/questions/46674293/… 再次感谢您在 TF 中写下函数！