ValueError：计算损失时没有为任何变量提供梯度答案

【问题标题】：ValueError: No gradients provided for any variable when calculating lossValueError：计算损失时没有为任何变量提供梯度
【发布时间】：2021-07-07 13:58:20
【问题描述】：

我一直在尝试使用 TensorFlow 实现 this paper on various RL methods 中描述的 DQN 的训练步骤，但是当我尝试使用 GradientTape 计算梯度时，我得到了 ValueError: No gradients provided for any variable:。下面是训练步骤代码：

def train_step(model, target, optimizer, observations, actions, rewards, next_observations):
    with tf.GradientTape() as tape:
        target_logits = tf.math.reduce_max(target(np.expand_dims(next_observations, -1)), 1)
        logits = model(np.expand_dims(observations, -1))

        act_logits = np.ndarray(EXPERIENCE_SAMPLE_SIZE)
        
        for i in range(EXPERIENCE_SAMPLE_SIZE):
            act_logits[i] = logits[i][actions[i]]

        act_logits = tf.convert_to_tensor(act_logits, dtype=tf.float32)

        y_T = tf.math.add(tf.convert_to_tensor(rewards, dtype=tf.float32), tf.math.scalar_mul(DISCOUNT_RATE, target_logits))

        loss = tf.math.squared_difference(act_logits, y_T)
        loss = tf.math.scalar_mul(1.0 / EXPERIENCE_SAMPLE_SIZE, loss)

    grads = tape.gradient(loss, model.trainable_variables)
    optimizer.apply_gradients(zip(grads, model.trainable_variables))

其中 model 和 target 是 tf.keras.Sequential 输出执行 5 个可能操作中的每一个的预期值，优化器是 SGD，observations、actions、rewards 和 @987654331 @ 是从体验回放缓冲区中采样的 numpy 数组。

这是实现上述论文中以下伪代码的一部分：

我最好的猜测是这个错误是因为索引logits 使得梯度无法区分，但我不知道要计算 Q*(s,a,theta) 数量。

【问题讨论】：

我找到了一个“解决方案”，将我的部分代码替换为以下代码：act_logits = np.zeros((EXPERIENCE_SAMPLE_SIZE, 5))for i in range(EXPERIENCE_SAMPLE_SIZE):act_logits[i][actions[i]] = 1.0act_logits = tf.math.reduce_max(tf.math.multiply(act_logits, logits), 1)

标签： python tensorflow

【解决方案1】：

为了社区的利益，在答案部分添加解决方案。

From Comments：

问题通过替换代码解决：

act_logits = np.ndarray(EXPERIENCE_SAMPLE_SIZE)
   
for i in range(EXPERIENCE_SAMPLE_SIZE):
    act_logits[i] = logits[i][actions[i]]

用代码：

act_logits = tf.math.reduce_max(tf.math.multiply(act_logits, logits), 1)

【讨论】：