与 keras 的注意力卷积答案

【问题标题】：Attentive Convolution with keras与 keras 的注意力卷积
【发布时间】：2018-07-15 06:17:10
【问题描述】：

我已经在 keras 中实现了一个周到的卷积层，如paper 中所述。

你可以在gist看到它的代码

我是实现自定义图层的新手，但速度仍然很慢。我使用了很多tf.map_fn，我认为这就是它如此缓慢的原因，但我不知道有什么不同的方法可以做到这一点。如果有人能提供一些如何改进层的技巧或实现自定义层的一般技巧，例如如何避免后端（张量流）功能，那就太好了。

我使用 keras 2.1.3 和 tensorflow 1.5 作为后端。

谢谢

【问题讨论】：

你为什么到处使用tf.map_fn？它们绝对没有必要。

标签： tensorflow machine-learning keras convolution keras-layer

【解决方案1】：

我不明白你为什么使用tf.map_fn，你可以在任何地方避免它......

这里有一些提示（可能会也可能不会使代码更快）。

选角

您真的需要将值转换为浮动吗？如果（至少）x[0] 是一个嵌入，它已经是一个浮点数，对吧？（不确定“上下文”的性质）

第 37 和 38 行：

text = x[0]
context = x[1]

为什么要在 keras 中支持地图功能？

例如，为什么要这样做（L42）：

weighted_attentive_context = tf.map_fn(self._compute_attentive_context, (text, context), dtype=K.floatx())

你什么时候可以这样做？

weighted_attentive_context = self._compute_attentive_context(text,context)

与：

def _comput_attentive_context(self,text,context):

对_compute_attentive_context的建议：

def _compute_attentive_context(self, text, context):

    #computes the context-score for every vector like equation 2
    temp = tf.matmul(text, self.We)
    scores = tf.matmul(temp, K.transpose(context))

    #why not?
    scores_softmax = K.softmax(scores)


    #computes the context featur_map like equation 4
    res = tf.matmul(scores_softmax, context)

    #why not?
    res = self._weight_for_output(res)
    return res

为什么不使用K.conv1D 来代替所有这些复杂的重复、连接等呢？

def _conv(self, x):
    return K.conv1D(x, self.W1, padding='same')

    #if you have special reasons for what you're doing, please share them in the comments,
    #please also share the exact shapes of the inputs and desired outputs
    #here, you should make self.W1 with shape (filterLength, em_dim, desired_output_dim)

对call的建议：

def call(self, x, mask=None):
    #x is a list of two tensors
    text = x[0]
    context = x[1]

    #applies bilinear energy funtion (text * We * context)
    #and weights the computed feature map like in equation 6 (W2 * ci)
    weighted_attentive_context = self._compute_attentive_context(text, context)

    #does the actual convolution, this is still kind of hacky
    conv = K.conv1D(text,self.W1,padding='same')

    added = conv + weighted_attentive_context
    batch = K.bias_add(added, self.bias)
    return batch

批量矩阵乘法

对于这些乘法，您可以使用K.dot()，如下所示：

如果批次 x 权重：K.dot(x, self.W)
如果重量 x 批次：K.permute_dimensions(K.dot(self.W,x),(1,0,2))

考虑到你有这些形状：

如果批次 x 权重 -> x: (batch, words, emb) | W：（emb，任何）
如果重量 x 批次 -> W：（任何，单词）| x: (batch, words, emb)

结果将是：

如果批次 x 权重：(words,any)
如果重量 x 批次：(any, emb)

【讨论】：

感谢您的建议。我对要点进行了一些更改，并在尺寸等方面添加了一些 cmets。
请参阅答案末尾的矩阵乘法提示。（你仍然可以删除所有的tf.map_fn，你在任何地方都不需要它们。
感谢您的回答。我实施了你的建议（见要点），但这让另一个错误（见question）