【问题标题】:Implementing self attention实施自我关注
【发布时间】:2019-10-24 05:05:24
【问题描述】:

我正在尝试在 Pytorch 中实现自我关注。 我需要计算以下表达式。

相似度函数S(二维),P(二维),C'

S[i][j] = W1 * inp[i] + W2 * inp[j] + W3 * x1[i] * inp[j]

P[i][j] = e^(S[i][j]) / 所有 j(e ^ (S[i])) 的总和

基本上,P 是一个 softmax 函数

C'[i] = 总和(对于所有 j)P[i][j] * x1[j]

我使用 for 循环尝试了以下代码

        for i in range(self.dim):
            for j in range(self.dim):
                S[i][j] = self.W1 * x1[i] + self.W2 * x1[j] + self.W3 * x1[i] * x1[j]

        for i in range(self.dim):
            for j in range(self.dim):
                P[i][j] = torch.exp(S[i][j]) / torch.sum( torch.exp(S[i]))

        # attend

        for i in range(self.dim):
            out[i] = 0
            for j in range(self.dim):
                out[i] += P[i][j] * x1[j]

有没有更快的方法在 Pytorch 中实现这一点?

【问题讨论】:

    标签: pytorch attention-model


    【解决方案1】:

    这是我在Dual Attention for HSI Imagery 中实现的自我注意示例

    class PAM_Module(Module):
    """ Position attention module  https://github.com/junfu1115/DANet/blob/master/encoding/nn/attention.py"""
    #Ref from SAGAN
    def __init__(self, in_dim):
        super(PAM_Module, self).__init__()
        self.chanel_in = in_dim
    
        self.query_conv = Conv2d(in_channels=in_dim, out_channels=in_dim//8, kernel_size=1)
        self.key_conv = Conv2d(in_channels=in_dim, out_channels=in_dim//8, kernel_size=1)
        self.value_conv = Conv2d(in_channels=in_dim, out_channels=in_dim, kernel_size=1)
    
        self.gamma = Parameter(torch.zeros(1))
    
        self.softmax = Softmax(dim=-1)
    def forward(self, x):
        """
            inputs :
                x : input feature maps( B X C X H X W)
            returns :
                out : attention value + input feature
                attention: B X (HxW) X (HxW)
        """
        m_batchsize, C, height, width = x.size()
        proj_query = self.query_conv(x).view(m_batchsize, -1, width*height).permute(0, 2, 1)
        proj_key = self.key_conv(x).view(m_batchsize, -1, width*height)
        energy = torch.bmm(proj_query, proj_key)
        attention = self.softmax(energy)
        proj_value = self.value_conv(x).view(m_batchsize, -1, width*height)
    
        out = torch.bmm(proj_value, attention.permute(0, 2, 1))
        out = out.view(m_batchsize, C, height, width)
    
        out = self.gamma*out + x
        #out = F.avg_pool2d(out, out.size()[2:4])
    
        return out
    

    【讨论】:

    猜你喜欢
    • 2010-12-21
    • 2016-12-06
    • 1970-01-01
    • 2014-11-16
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多