TensorFlow Probability 和 PyTorch 中未实现正态和拉普拉斯的 KL 散度答案

【问题标题】：KL Divergence of Normal and Laplace isn't Implemented in TensorFlow Probability and PyTorchTensorFlow Probability 和 PyTorch 中未实现正态和拉普拉斯的 KL 散度
【发布时间】：2019-04-01 20:21:03
【问题描述】：

在 TensorFlow Probability (v0.4.0) 和 PyTorch (v0.4.1) 中，正态分布 (tfp, PyTorch) 和拉普拉斯分布 (tfp, PyTorch) 的 KL 散度都不是t 实现导致抛出 NotImplementedError 错误。

>>> import tensorflow as tf
>>> import tensorflow_probability as tfp
>>> tfd = tfp.distributions
>>> import torch
>>>
>>> tf.__version__
'1.11.0'
>>> tfp.__version__
'0.4.0'
>>> torch.__version__
'0.4.1'
>>> 
>>> p = tfd.Normal(loc=0., scale=1.)
>>> q = tfd.Laplace(loc=0., scale=1.)
>>> tfd.kl_divergence(p, q)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/root/miniconda/envs/example/lib/python3.6/site-packages/tensorflow/python/ops/distributions/kullback_leibler.py", line 95, in kl_divergence
    % (type(distribution_a).__name__, type(distribution_b).__name__))
NotImplementedError: No KL(distribution_a || distribution_b) registered for distribution_a type Normal and distribution_b type Laplace
>>> 
>>> a = torch.distributions.normal.Normal(loc=0., scale=1.)
>>> b = torch.distributions.laplace.Laplace(loc=0., scale=1.)
>>> torch.distributions.kl.kl_divergence(a,b)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/root/miniconda/envs/example/lib/python3.6/site-packages/torch/distributions/kl.py", line 161, in kl_divergence
    raise NotImplementedError
NotImplementedError

我假设这两个库都缺少此功能，这是有充分理由的，并且用户应该自己使用 TensorFlow Probability 中的 tfp.distributions.RegisterKL 和 PyTorch 中的 torch.distributions.kl.register_kl 来实现它。

这是正确的假设吗？如果是这样，有人可以解释为什么不对给定的分布类别实施 KL Divergence 吗？我想我错过了一些非常基本的东西。

如果我的假设是错误的，有人可以解释如何正确地让 TensorFlow 和 PyTorch 实现这些操作吗？

如需更多参考，本示例使用与 Edward 配合使用的旧版 TensorFlow，

pip install tensorflow==1.7
pip install edward

在上面这个最小的示例中，我尝试在tfp（或torch）中实现与以下edward 玩具示例代码等效的代码。

import tensorflow as tf
import edward as ed

p = ed.models.Normal(loc=0., scale=1.)
s = tf.Variable(1.)
q = ed.models.Laplace(loc=0., scale=s)
inference = ed.KLqp({p: q})
inference.run(n_iter=5000)

【问题讨论】：

不知道关于 TF 或 PyTorch，但由于 KL(p, q) = cross-entropy(p, q) 减去熵(p, p)，我想你可能可以从定义。也许像 Maxima (maxima.sourceforge.net) 这样的符号计算系统可以帮助处理积分。
@RobertDodier 确实，计算这样的事情是微不足道的。千里马，虽然很好（所以你的工作的道具）甚至不需要。这更像是一个演示更大问题的最小示例。

标签： python tensorflow statistics pytorch tensorflow-probability

【解决方案1】：

IIRC，Edward 的 KLqp 切换尝试使用分析形式，如果没有切换到使用示例 KL。

对于 TFP，我认为 PyTorch，kl_divergence 仅适用于分布注册，并且不像 Edward 只计算分析 KL。正如您所提到的，这些并没有在 TFP 中实现，我想说这更多是因为常见的情况（例如 KL(MultivariateNormal || MultivariateNormal) 已经实现。

要注册 KL 散度，您可以执行以下操作：https://github.com/tensorflow/probability/blob/07878168731e0f6d3d0e7c878bdfd5780c16c8d4/tensorflow_probability/python/distributions/gamma.py#L275。（如果您可以在https://github.com/tensorflow/probability 提交 PR，那就太好了！）。

如果事实证明没有合适的分析形式（在我的脑海中，我不知道是否有），那么可以形成样本 KL 并用它进行优化。这可以在 TFP 中明确地完成（通过采样和计算样本 KL。如果您希望这也更自动地完成，请提交 PR。这是我们 TFP 中的一些人感兴趣的事情。

看看在哪些情况下分析 KL 可以实现自动化会很有趣。例如，如果 q 和 p 来自同一个指数族，那么就充分统计和归一化器而言，KL 散度有一个很好的形式。但是对于跨指数族（甚至不是指数族）的 KL，我不知道您可以半自动计算类内 KL 的分布类别的结果。

【讨论】：