Pytorch 线性层现在自动重塑输入？答案

【问题标题】：Pytorch Linear Layer now automatically reshape the input?Pytorch 线性层现在自动重塑输入？
【发布时间】：2019-07-22 17:30:14
【问题描述】：

我记得在过去，nn.Linear 只接受 2D 张量。

但今天，我发现nn.Linear 现在接受 3D，甚至是任意维度的张量。

X = torch.randn((20,20,20,20,10))
linear_layer = nn.Linear(10,5)
output = linear_layer(X)
print(output.shape)
>>> torch.Size([20, 20, 20, 20, 5])

当我查看 Pytorch 的文档时，它确实说现在需要

输入：:math:(N, *, H_{in}) 其中 :math:* 表示任意数量的附加维度和:math:H_{in} = \text{in\_features}

所以在我看来，Pytorch nn.Linear 现在自动通过 x.view(-1, input_dim) 重塑输入。

但我在source code 中找不到任何x.shape 或x.view：

class Linear(Module):
    __constants__ = ['bias']

    def __init__(self, in_features, out_features, bias=True):
        super(Linear, self).__init__()
        self.in_features = in_features
        self.out_features = out_features
        self.weight = Parameter(torch.Tensor(out_features, in_features))
        if bias:
            self.bias = Parameter(torch.Tensor(out_features))
        else:
            self.register_parameter('bias', None)
        self.reset_parameters()

    def reset_parameters(self):
        init.kaiming_uniform_(self.weight, a=math.sqrt(5))
        if self.bias is not None:
            fan_in, _ = init._calculate_fan_in_and_fan_out(self.weight)
            bound = 1 / math.sqrt(fan_in)
            init.uniform_(self.bias, -bound, bound)

    @weak_script_method
    def forward(self, input):
        return F.linear(input, self.weight, self.bias)

    def extra_repr(self):
        return 'in_features={}, out_features={}, bias={}'.format(
            self.in_features, self.out_features, self.bias is not None
        )

谁能证实这一点？

【问题讨论】：

标签： python-3.x pytorch

【解决方案1】：

torch.nn.Linear 在底层使用torch.nn.functional.linear 函数，这就是操作发生的地方（参见documentation）。

看起来像这样（为简洁起见，删除了文档字符串和装饰器）：

def linear(input, weight, bias=None):
    if input.dim() == 2 and bias is not None:
        # fused op is marginally faster
        ret = torch.addmm(bias, input, weight.t())
    else:
        output = input.matmul(weight.t())
        if bias is not None:
            output += bias
        ret = output
    return ret

第一种情况是addmm，它实现了beta*mat + alpha*(mat1 @ mat2)，据说速度更快（例如，参见here）。

第二个操作是matmul，可以看他们的docs，它会根据提供的张量的形状执行各种操作（五种情况，这里就不大肆复制了） .

总而言之，它保留了第一个 batch 和最后一个 features dimension 之间的维度。无论如何都没有使用view()，尤其是这个x.view(-1, input_dim)，请检查以下代码：

import torch

tensor1 = torch.randn(10, 3, 4)
tensor2 = torch.randn(10, 4, 5)

print(torch.matmul(tensor1, tensor2).shape)
print(torch.matmul(tensor1, tensor2).view(-1, tensor1.shape[1]).shape)

给出：

torch.Size([10, 3, 5]) # preserves input's 3
torch.Size([50, 3]) # destroys the batch even

【讨论】：