【问题标题】:How can I change my Pytorch module (with in-place operations) to be differentiable?如何将我的 Pytorch 模块(使用就地操作)更改为可区分的?
【发布时间】:2020-06-21 10:22:24
【问题描述】:

我的层是这样的(我正在制作一个 LSTM 层,每个时间步都应用 dropout,输入通过 10 次,然后返回输出的平均值)

import torch
from torch import nn


class StochasticLSTM(nn.Module):
    def __init__(self, input_size: int, hidden_size: int, dropout_rate: float):
        """
        Args:
        - dropout_rate: should be between 0 and 1
        """
        super(StochasticLSTM, self).__init__()

        self.iter = 10
        self.input_size = input_size
        self.hidden_size = hidden_size

        if not 0 <= dropout_rate <= 1:
            raise Exception("Dropout rate should be between 0 and 1")
        self.dropout = dropout_rate
        self.bernoulli_x = torch.distributions.Bernoulli(
            torch.full((self.input_size,), 1 - self.dropout)
        )
        self.bernoulli_h = torch.distributions.Bernoulli(
            torch.full((hidden_size,), 1 - self.dropout)
        )

        self.Wi = nn.Linear(self.input_size, self.hidden_size)
        self.Ui = nn.Linear(self.hidden_size, self.hidden_size)

        self.Wf = nn.Linear(self.input_size, self.hidden_size)
        self.Uf = nn.Linear(self.hidden_size, self.hidden_size)

        self.Wo = nn.Linear(self.input_size, self.hidden_size)
        self.Uo = nn.Linear(self.hidden_size, self.hidden_size)

        self.Wg = nn.Linear(self.input_size, self.hidden_size)
        self.Ug = nn.Linear(self.hidden_size, self.hidden_size)

    def forward(self, input, hx=None):
        """
        input shape (sequence, batch, input dimension)
        output shape (sequence, batch, output dimension)
        return output, (hidden_state, cell_state)
        """

        T, B, _ = input.shape

        if hx is None:
            hx = torch.zeros((self.iter, T + 1, B, self.hidden_size), dtype=input.dtype)
        else:
            hx = hx.unsqueeze(0).repeat(self.iter, T + 1, B, self.hidden_size)

        c = torch.zeros((self.iter, T + 1, B, self.hidden_size), dtype=input.dtype)
        o = torch.zeros((self.iter, T, B, self.hidden_size), dtype=input.dtype)

        for it in range(self.iter):
            # Dropout
            zx = self.bernoulli_x.sample()
            zh = self.bernoulli_h.sample()

            for t in range(T):
                x = input[t] * zx
                h = hx[it, t] * zh

                i = torch.sigmoid(self.Ui(h) + self.Wi(x))
                f = torch.sigmoid(self.Uf(h) + self.Wf(x))

                o[it, t] = torch.sigmoid(self.Uo(h) + self.Wo(x))
                g = torch.tanh(self.Ug(h) + self.Wg(x))

                c[it, t + 1] = f * c[it, t] + i * g
                hx[it, t + 1] = o[it, t] * torch.tanh(c[it, t + 1])

        o = torch.mean(o, axis=0)
        c = torch.mean(c[:, 1:], axis=0)
        hx = torch.mean(hx[:, 1:], axis=0)

        return o, (hx, c)

当我优化网络时,出现错误one of the variables needed for gradient computation has been modified by an inplace operation。我们可以发现几个就地操作,例如o[it, t] = torch.sigmoid(self.Uo(h) + self.Wo(x))

当我想求平均值时,如何避免这种就地操作?

谢谢

【问题讨论】:

    标签: deep-learning pytorch lstm recurrent-neural-network dropout


    【解决方案1】:

    相反,使用 Python 列表来收集张量结果,并在最后将列表堆叠成一个张量,例如而不是

    t = torch.zeros(5, 5)
    for i in range(5):
        t[i,:] = ...
    

    这样做

    t = []
    for i in range(5):
        t.append(...)
    t = torch.stack(t)
    

    【讨论】:

      猜你喜欢
      • 2020-07-11
      • 2019-01-19
      • 2020-09-26
      • 2021-05-10
      • 2019-12-21
      • 2021-11-04
      • 1970-01-01
      • 1970-01-01
      • 2020-11-04
      相关资源
      最近更新 更多