PyTorch 中的 softmax 暗淡和变量 volatile答案

【问题标题】：softmax dims and variable volatile in PyTorchPyTorch 中的 softmax 暗淡和变量 volatile
【发布时间】：2020-04-20 23:23:50
【问题描述】：

我有一个 PyTorch 以前版本的代码，我收到 2 条关于它的第 3 行的警告：

import torch.nn.functional as F

def select_action(self, state):
        probabilities = F.softmax(self.model(Variable(state, volatile = True))*100) # T=100
        action = probs.multinomial(num_samples=1)
        return action.data[0,0]

UserWarning: volatile 已被移除，现在无效。请改用with torch.no_grad():。

用户警告：softmax 的隐式维度选择已被弃用。更改调用以包含 dim=X > 作为参数。

我发现：

当您确定时，建议将 Volatile 用于纯推理模式你甚至不会调用 .backward()。它比任何东西都更有效其他 autograd 设置 - 它将使用绝对最小量内存来评估模型。 volatile 也决定了 requires_grad 为 False。

我应该删除它是对的吗？因为我想得到概率，所以我应该使用 dim=1 吗？我的代码的第三行应该是这样的：

    probabilities = F.softmax(self.model(Variable(state), dim=1)*100) # T=100

状态在此处创建：

def update(self, reward, new_signal):
   new_state = torch.Tensor(new_signal).float().unsqueeze(0)
   self.memory.push((self.last_state, new_state, torch.LongTensor([int(self.last_action)]), torch.Tensor([self.last_reward])))
   action = self.select_action(new_state)
   if len(self.memory.memory) > 100:
       batch_state, batch_next_state, batch_action, batch_reward = self.memory.sample(100)
       self.learn(batch_state, batch_next_state, batch_reward, batch_action)
   self.last_action = action
   self.last_state = new_state
   self.last_reward = reward
   self.reward_window.append(reward)
   if len(self.reward_window) > 1000:
       del self.reward_window[0]
   return action

【问题讨论】：

标签： python-3.x pytorch

【解决方案1】：

你是对的，但不是“完全”正确。

除了您提到的更改，您应该使用torch.no_grad()，如下所述：

def select_action(self, state):
    with torch.no_grad():
        probabilities = F.softmax(self.model(state), dim=1)*100
        action = probs.multinomial(num_samples=1)
        return action.data[0,0]

此块关闭其中代码的 autograd 引擎（因此您可以像 volatile 一样保存内存）。

另外请注意Variable 也已被弃用（检查here），state 应该只是使用torch.tensor 创建的requires_grad=True。

顺便说一句。你有 probs 和 probabilities 但我认为这是同一件事，只是一个错字。

【讨论】：

我已经编辑了最初的帖子，并在其中包含了一段代码，其中创建了 state。您的意思是我应该将其更正为： new_state = torch.Tensor(new_signal, requires_grad=True) 并删除 .float().unsqueeze(0) ？跨度>
@Kosh new_state = torch.Tensor(new_signal, requires_grad=True).float().unsqueeze(0)。您不应该删除它，因为它只是将张量转换为特定类型并添加额外的第一维（可能是批处理）。
添加 requires_grad=True 后我开始收到 TypeError: new() received an invalid combination of arguments - got (list, requires_grad=bool), but expected one of: * (*, torch.device device) 不匹配，因为某些关键字不正确： requires_grad * (torch.Storage storage) * (Tensor other) * (tuple of ints size, *, torch.device device) * (object数据，*，torch.device 设备）
@Kosh 版本的 pytorch？并尝试使用torch.tensor 作为快速修复
终于搞定了。我的问题的根源在于表面。你写的是probabilities = F.softmax(self.model(state), dim=1)*100，而它应该是probabilities = F.softmax(self.model(state)*100, dim=1)实际上我在解决这个问题时已经理解了很多东西）

【解决方案2】：

我发现在python 2.7 - “自动驾驶汽车”应用程序中编写了相同的源代码。我无法为python 2.7 安装pytorch/pytorch-cpu（CUDA 驱动程序问题...）所以我必须修复代码以在python 3.* 中运行。

这是我为使其正常工作所做的更改（包括上面其他人建议的更改）：像这样更新Dqn 类的select_action 和learn 函数：

    def select_action(self, state):
        with torch.no_grad():
            probs = F.softmax(self.model(state) * 100, dim=1)  # T=100
            action = probs.multinomial(num_samples=1)
            return action.data[0, 0]

    def learn(self, batch_state, batch_next_state, batch_reward, batch_action):
        outputs = self.model(batch_state).gather(1, batch_action.unsqueeze(1)).squeeze(1)
        next_outputs = self.model(batch_next_state).detach().max(1)[0]
        target = self.gamma * next_outputs + batch_reward
        td_loss = F.smooth_l1_loss(outputs, target)
        self.optimizer.zero_grad()
        td_loss.backward()
        self.optimizer.step()

【讨论】：