用于 One-Hot 编码的 Keras 自定义损失答案

【问题标题】：Keras Custom Loss for One-Hot Encoded用于 One-Hot 编码的 Keras 自定义损失
【发布时间】：2021-10-07 09:59:25
【问题描述】：

我目前有一个我训练的 DNN，它可以预测游戏所处状态的 one-hot 编码分类。基本上，假设有三个状态，0, 1, or 2.

现在，我通常会使用 categorical_cross_entropy 作为损失函数，但我意识到并不是所有的分类对于我的状态都是不相等的。例如：

如果模型预测它应该是状态 1，那么如果分类错误，我的系统不会有任何成本，因为状态 1 基本上什么都不做，所以奖励 0x。
如果模型正确预测状态 0 或 2（即预测 = 2 并且正确 = 2），那么奖励应该是 3 倍。
如果模型不正确预测状态 0 或 2（即预测 = 2 且正确 = 0），那么奖励应该是 -1x。

我知道我们可以在 Keras 中声明我们的自定义损失函数，但我一直卡在形成它。有人对如何转换该伪代码有建议吗？我不知道如何在向量操作中做到这一点。

其他问题：我认为我基本上是在追求奖励功能。这和损失函数一样吗？谢谢！

def custom_expectancy(y_expected, y_pred):
    
    # Get 0, 1 or 2
    expected_norm = tf.argmax(y_expected);
    predicted_norm = tf.argmax(y_pred);
    
    # Some pseudo code....
    # Now, if predicted == 1
    #     loss += 0
    # elif predicted == expected
    #     loss -= 3
    # elif predicted != expected
    #     loss += 1
    #
    # return loss

咨询的来源：

https://datascience.stackexchange.com/questions/55215/how-do-i-create-a-keras-custom-loss-function-for-a-one-hot-encoded-binary-classi

Custom loss in Keras with softmax to one-hot

代码更新

import tensorflow as tf
def custom_expectancy(y_expected, y_pred):
    
    # Get 0, 1 or 2
    expected_norm = tf.argmax(y_expected);
    predicted_norm = tf.argmax(y_pred);
    
    results = tf.unstack(expected_norm)
    
    # Some pseudo code....
    # Now, if predicted == 1
    #     loss += 0
    # elif predicted == expected
    #     loss += 3
    # elif predicted != expected
    #     loss -= 1
    
    for idx in range(0, len(expected_norm)):
        predicted = predicted_norm[idx]
        expected = expected_norm[idx]
        
        if predicted == 1: # do nothing
            results[idx] = 0.0
        elif predicted == expected: # reward
            results[idx] = 3.0
        else: # wrong, so we lost
            results[idx] = -1.0
    
    
    return tf.stack(results)

我认为这就是我所追求的，但我还没有完全弄清楚如何构建正确的张量（应该是批量大小）以返回。

【问题讨论】：

标签： python tensorflow machine-learning keras deep-learning

【解决方案1】：

构建条件自定义损失的最佳方法是使用 tf.keras.backend.switch 而不涉及循环。

在你的情况下，你应该结合2个switch条件表达式来获得想要的结果。

想要的损失函数可以这样复现：

def custom_expectancy(y_expected, y_pred):
    
    zeros = tf.cast(tf.reduce_sum(y_pred*0, axis=-1), tf.float32) ### important to produce gradient
    y_expected = tf.cast(tf.reshape(y_expected, (-1,)), tf.float32)
    class_pred = tf.argmax(y_pred, axis=-1)
    class_pred = tf.cast(class_pred, tf.float32)
    
    cond1 = (class_pred != y_expected) & (class_pred != 1)
    cond2 = (class_pred == y_expected) & (class_pred != 1)
    
    res1 = tf.keras.backend.switch(cond1, zeros -1, zeros)
    res2 = tf.keras.backend.switch(cond2, zeros +3, zeros)
    
    return res1 + res2

cond1 是模型错误预测状态 0 或 2 时，cond2 是模型正确预测状态 0 或 2 时。标准状态为零，当 cond1 和 cond2 不正确时返回已激活。

您会注意到y_expected 可以作为整数编码状态的简单张量/数组传递（无需一次性处理它们）。

损失函数的工作原理如下：

true = tf.constant([[1],    [2],    [1],    [0]    ])  ## no need to one-hot
pred = tf.constant([[0,1,0],[0,0,1],[0,0,1],[0,1,0]])

custom_expectancy(true, pred)

<tf.Tensor: shape=(4,), dtype=float32, numpy=array([ 0.,  3., -1.,  0.], dtype=float32)>

这似乎符合我们的需求。

要在模型中使用损失：

X = np.random.uniform(0,1, (1000,10))
y = np.random.randint(0,3, (1000)) ## no need to one-hot

model = Sequential([Dense(3, activation='softmax')])
model.compile(optimizer='adam', loss=custom_expectancy)
model.fit(X,y, epochs=3)

Here 正在运行的笔记本

【讨论】：

【解决方案2】：

Here there is a nice post explaining the concepts of the loss function and cost function。多个答案说明了机器学习领域的不同作者是如何考虑它们的。

关于损失函数，你可以找到the following implementation useful。它实现了加权交叉熵损失，您可以根据训练中的重量成比例地对每个类别进行加权。这可以适应上面指定的约束。

【讨论】：

【解决方案3】：

这就是您想要的方式。如果你的ground truth y_true 是密集的（形状为N3），你可以使用tf.reduce_all(y_true == [0.0, 0.0, 1.0], axis=-1, keepdims=True) 和tf.reduce_all(y_true == [1.0, 0.0, 0.0], axis=-1, keepdims=True) 来控制if/elif/else。您可以使用 tf.gather 进一步优化它。

def sparse_loss(y_true, y_pred):
  """Calculate loss for game. Follows keras loss signature.
  
  Args:
    y_true: Sparse tensor of shape N1, where correct prediction
      is encoded as 0, 1, or 2. 
    y_pred: Tensor of shape N3. For each row, the three columns
      represent the predicted probability of each state. 
      For example, [0.1, 0.4, 0.6] means, "There's a 10% chance the 
      right state is 0; 40% chance the right state is 1, 
      and 60% chance the right state is 2". 
  """

  # This is the unvectorized implementation on individual rows which is more
  # intuitive. But TF requires vectorization. 
  # if y_true == 0:
  #   # Value matrix is shape 3. Broadcasting will occur. 
  #   return -tf.reduce_sum(y_pred * [3.0, 0.0, -1.0])
  # elif y_true == 2:
  #   return -tf.reduce_sum(y_pred * [-1.0, 0.0, 3.0])
  # else:
  #   # According to the rules, this is never the correct
  #   # state the predict so it should never show up.
  #   assert False, f'Impossible state reached. y_true: {y_true}, y_pred: {y_pred}.'


  # We vectorize by calculating the reward for all predictions for two cases:
  # if y_true is zero or if y_true is two. To eliminate this inefficiency, we 
  # could us tf.gather to build an N3 shaped matrix to multiply against. 
  reward_for_true_zero = tf.reduce_sum(y_pred * [3.0, 0.0, -1.0], axis=-1, keepdims=True) # N1
  reward_for_true_two = tf.reduce_sum(y_pred * [-1.0 ,0.0, 3.0], axis=-1, keepdims=True) # N1

  reward = tf.where(y_true == 0.0, reward_for_true_zero, reward_for_true_one) # N1
  return -tf.reduce_sum(reward)

【讨论】：