【问题标题】:PyTorch Boolean - Stop Backpropagation?PyTorch Boolean - 停止反向传播?
【发布时间】:2021-05-13 17:36:37
【问题描述】:

我需要创建一个神经网络,在其中使用二进制门将某些张量归零,这些张量是禁用电路的输出。

为了提高运行速度,我期待使用torch.bool 二进制门来停止沿网络中禁用电路的反向传播。但是,我使用官方的 PyTorch 示例为 CIFAR-10 数据集创建了一个小实验,对于 gate_Agate_B 的任何值,运行时速度完全相同:(这意味着该想法行不通)

class Net(nn.Module):
    def __init__(self):
        super().__init__()
        self.pool = nn.MaxPool2d(2, 2)
        self.conv1a = nn.Conv2d(3, 6, 5)
        self.conv2a = nn.Conv2d(6, 16, 5)
        self.conv1b = nn.Conv2d(3, 6, 5)
        self.conv2b = nn.Conv2d(6, 16, 5)
        self.fc1 = nn.Linear(32 * 5 * 5, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)
        
    def forward(self, x):
        # Only one gate is supposed to be enabled at random
        # However, for the experiment, I fixed the values to [1,0] and [1,1]
        choice  =  randint(0,1)
        gate_A  =  torch.tensor(choice   ,dtype = torch.bool) 
        gate_B  =  torch.tensor(1-choice ,dtype = torch.bool) 
        
        a = self.pool(F.relu(self.conv1a(x)))
        a = self.pool(F.relu(self.conv2a(a)))
        
        b = self.pool(F.relu(self.conv1b(x)))
        b = self.pool(F.relu(self.conv2b(b)))
        
        a *= gate_A
        b *= gate_B
        x  = torch.cat( [a,b], dim = 1 )
        
        x = torch.flatten(x, 1) # flatten all dimensions except batch
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x

我如何定义 gate_Agate_B 以使反向传播在它们为零时有效地停止?

PS。在运行时动态更改concatenation 也会更改分配给每个模块的权重。 (例如,与a 关联的权重可能会在另一遍中分配给b,从而破坏网络的运行方式)。

【问题讨论】:

  • a 是否总是意味着启用,b 总是意味着被禁用,就像在你的例子中一样?如果不是,代码的哪一部分决定了这一点?
  • 不,实际上它们应该是随机变化的 :)
  • 那么,您有两个门,并且随机启用了一个?
  • 是的,它是正确的。这种技术对于神经架构搜索是必不可少的。如果不以某种方式停止沿禁用门的反向传播,则运行时间会成倍增加。

标签: machine-learning deep-learning neural-network pytorch backpropagation


【解决方案1】:

您可以使用torch.no_grad(下面的代码可能会更简洁):

def forward(self, x):
        # Only one gate is supposed to be enabled at random
        # However, for the experiment, I fixed the values to [1,0] and [1,1]
        choice  =  randint(0,1)
        gate_A  =  torch.tensor(choice   ,dtype = torch.bool) 
        gate_B  =  torch.tensor(1-choice ,dtype = torch.bool) 
        
        if choice:
            a = self.pool(F.relu(self.conv1a(x)))
            a = self.pool(F.relu(self.conv2a(a)))
            a *= gate_A
            
            with torch.no_grad(): # disable gradient computation
                b = self.pool(F.relu(self.conv1b(x)))
                b = self.pool(F.relu(self.conv2b(b)))
                b *= gate_B
        else:
            with torch.no_grad(): # disable gradient computation
                a = self.pool(F.relu(self.conv1a(x)))
                a = self.pool(F.relu(self.conv2a(a)))
                a *= gate_A
            
            b = self.pool(F.relu(self.conv1b(x)))
            b = self.pool(F.relu(self.conv2b(b)))
            b *= gate_B

        x  = torch.cat( [a,b], dim = 1 )
        
        x = torch.flatten(x, 1) # flatten all dimensions except batch
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x

再看,我认为以下是针对特定问题的更简单的解决方案:

def forward(self, x):
        # Only one gate is supposed to be enabled at random
        # However, for the experiment, I fixed the values to [1,0] and [1,1]
        choice  =  randint(0,1)

        if choice:
            a = self.pool(F.relu(self.conv1a(x)))
            a = self.pool(F.relu(self.conv2a(a)))
            b = torch.zeros(shape_of_conv_output) # replace shape of conv output here
        else:
            b = self.pool(F.relu(self.conv1b(x)))
            b = self.pool(F.relu(self.conv2b(b)))
            a = torch.zeros(shape_of_conv_output) # replace shape of conv output here
       
        x  = torch.cat( [a,b], dim = 1 )
        
        x = torch.flatten(x, 1) # flatten all dimensions except batch
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x

【讨论】:

  • @C-3PO 再看一遍,我不确定在您的特定情况下是否有必要这样做。为什么你不能跳过禁用门的前向传递计算?根据选择,您可以向前传递一半,将另一半与零连接,然后像以前一样继续。 (我认为你可以,但我之所以问是因为你在问题中的 PS)。
  • 是的,我也是这么想的。我实际上是在喝咖啡时发明了下面帖子中的代码。感谢您的反馈。
  • @C-3PO 不错,结果一模一样 :)
【解决方案2】:

简单的解决方案,当ab 被禁用时,只需定义一个带零的张量:)

class Net(nn.Module):
    def __init__(self):
        super().__init__()
        self.pool = nn.MaxPool2d(2, 2)
        self.conv1a = nn.Conv2d(3, 6, 5)
        self.conv2a = nn.Conv2d(6, 16, 5)
        self.conv1b = nn.Conv2d(3, 6, 5)
        self.conv2b = nn.Conv2d(6, 16, 5)
        self.fc1 = nn.Linear(32 * 5 * 5, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)
        
    def forward(self, x):
        
        if randint(0,1):
            a = self.pool(F.relu(self.conv1a(x)))
            a = self.pool(F.relu(self.conv2a(a)))
            b = torch.zeros_like(a)
        else:
            b = self.pool(F.relu(self.conv1b(x)))
            b = self.pool(F.relu(self.conv2b(b)))
            a = torch.zeros_like(b)
        
        x  = torch.cat( [a,b], dim = 1 )
        
        x = torch.flatten(x, 1) # flatten all dimensions except batch
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x

PS。我在喝咖啡的时候想到了这个。

【讨论】:

    猜你喜欢
    • 2021-11-20
    • 2020-09-10
    • 1970-01-01
    • 1970-01-01
    • 2019-07-24
    • 1970-01-01
    • 2022-06-15
    • 2022-01-18
    • 2020-05-06
    相关资源
    最近更新 更多