在 Pytorch 中创建自定义连接/非全连接层答案

【问题标题】：Create custom connection/ non-fully connected layers in Pytorch在 Pytorch 中创建自定义连接/非全连接层
【发布时间】：2023-01-25 10:51:36
【问题描述】：

如图所示，它是一个带有NN的3层，即输入层、隐藏层和输出层。我想设计 NN（在 PyTorch 中，只是拱门），其中隐藏层的输入是完全连接的。然而，从隐藏层到输出，隐藏层的前两个神经元应该连接到输出层的第一个神经元，后两个应该连接到输出层的第二个神经元，依此类推。这应该如何设计？

from torch import nn
layer1 = nn.Linear(input_size, hidden_size)
layer2 = ??????

【问题讨论】：

标签： python machine-learning deep-learning neural-network pytorch

【解决方案1】：

正如@Jan 所说here，您可以重载nn.Linear 并提供逐点掩码来掩蔽您想要避免的交互。请记住，全连接层只是带有可选加性偏差的矩阵乘法。

查看它的source code，我们可以这样做：

class MaskedLinear(nn.Linear):
    def __init__(self, *args, mask, **kwargs):
        super().__init__(*args, **kwargs)
        self.mask = mask

    def forward(self, input):
        return F.linear(input, self.weight, self.bias)*self.mask

将 F 定义为 torch.nn.functional

考虑到您对第二层的约束：

隐藏层的前两个神经元应该连接到输出层的第一个神经元

看来您正在寻找这种模式：

tensor([[1., 0., 0.],
        [1., 0., 0.],
        [0., 1., 0.],
        [0., 1., 0.],
        [0., 0., 1.],
        [0., 0., 1.]])

可以使用torch.block_diag获得：

mask = torch.block_diag(*[torch.ones(2,1),]*output_size)

有了这个，您可以将您的网络定义为：

net = nn.Sequential(nn.Linear(input_size, hidden_size),
                    MaskedLinear(hidden_size, output_size, mask))

如果你愿意，你甚至可以在自定义层中实现它：

class LocalLinear(nn.Linear):
    def __init__(self, *args, kernel_size=2, **kwargs):
        super().__init__(*args, **kwargs)

        assert self.in_features == kernel_size*self.out_features
        self.mask = torch.block_diag(*[torch.ones(kernel_size,1),]*self.out_features)

def forward(self, input):
    return F.linear(input, self.weight, self.bias)*self.mask

并像这样定义它：

net = nn.Sequential(nn.Linear(input_size, hidden_size),
                    LocalLinear(hidden_size, output_size))

【讨论】：

我的输入大小是 (batch_size, 100)，我的掩码是 (100, 10)，该行：out = F.linear(input*self.mask, self.weight, self.bias) throwing error: RuntimeError: The size of张量 a (100) 必须与非单维 1 处的张量 b (10) 的大小匹配
你是对的，有一个问题。应在推断线性层之后而不是之前应用遮罩。请参阅上面的编辑。
这似乎不对。权重矩阵是需要mask的，不是weight*input + bias的输出。乘法发生后，我们无法删除不需要的交互。

【解决方案2】：

不要直接使用 nn.Linear，而是创建一个权重张量 weight 和一个屏蔽张量 mask 来屏蔽那些您不打算使用的权重。然后你用torch.nn.functional.linear(input, weight * mask)（https://pytorch.org/docs/stable/generated/torch.nn.functional.linear.html）转发第二层。请注意，这是在您的torch.nn.Module 的forward 函数中实现的。重量需要作为参数注册到您的nn.Module，以便它被nn.Module.parameters()识别。见https://pytorch.org/docs/stable/generated/torch.nn.Module.html#torch.nn.Module.register_parameter。

【讨论】：

【解决方案3】：

Ivan 的一般方法（屏蔽全连接层）可能会像我的评论中那样进行修改，但它增加了很多无用的计算！

最好在这里编写一个自定义层，其权重矩阵的形状为(2, hidden_size//2)。然后将隐藏层输出的输入从 (hidden_size) 重塑为 (hidden_size//2, 2) 并进行矩阵乘法。

像这样的东西（未经测试）：

class MyLayer(nn.Module):
    def __init__(self, in_channels, out_channels):
        super().__init__()
        self.weight = torch.zeros(2, in_channels // 2)
        self.bias = torch.zeros(in_channels // 2)

    def forward(self, inp):
        return torch.matmul(inp.reshape(-1, inp.shape[-1]//2, 2), self.weight) + self.bias

【讨论】：