为什么 tfa.layers.GroupNormalization(groups=1) 产生的输出与 LayerNormalization 不同？答案

【问题标题】：Why does tfa.layers.GroupNormalization(groups=1) produce different output than LayerNormalization?为什么 tfa.layers.GroupNormalization(groups=1) 产生的输出与 LayerNormalization 不同？
【发布时间】：2021-08-04 14:05:11
【问题描述】：

来自tensorflow 插件中的组规范化文档，它指出如果组数设置为1，则组规范层应该成为层规范化。

但是，当我通过将第一层称为测试张量来尝试此操作时，结果会有所不同。似乎 group norm 层计算时间和通道轴的均值和方差，而 layer norm 独立计算每个通道的向量。

这是一个错误还是我遗漏了什么？层规范的当前行为实际上对于我正在做的事情是可取的。

这是GroupNormalization的文档：

In [5]: x = tf.constant([[[1, 2], [3, 40]], [[1 , -1], [2, 200]]], dtype = tf.float32)
In [6]: tf.keras.layers.LayerNormalization()(x)                                                                                                               
Out[6]: 
<tf.Tensor: shape=(2, 2, 2), dtype=float32, numpy=
array([[[-0.99800587,  0.99800587],
        [-0.99999857,  0.99999857]],

       [[ 0.9995002 , -0.9995002 ],
        [-1.        ,  1.        ]]], dtype=float32)>

In [7]: tfa.layers.GroupNormalization(groups = 1)(x)                                                                                                          
Out[7]: 
<tf.Tensor: shape=(2, 2, 2), dtype=float32, numpy=
array([[[-0.6375344 , -0.57681686],
        [-0.5160993 ,  1.7304504 ]],

       [[-0.5734435 , -0.5966129 ],
        [-0.5618587 ,  1.7319152 ]]], dtype=float32)>

【问题讨论】：

标签： python tensorflow machine-learning deep-learning keras-layer

【解决方案1】：

根据tf.keras.layers.LayerNormalization中的文档TF 2.4.1，source：

请注意，层规范化的其他实现可能会选择在一组单独的轴上定义 gamma 和 beta 被规范化。例如，组大小为 1 的 Group Normalization (Wu et al. 2018) 对应于 Layer Normalization 在高度、宽度和通道上进行标准化，并具有 gamma 和 beta 仅跨越通道维度。 所以，这个层标准化实现不会将组规范化层与组匹配大小设置为 1。

【讨论】：