这些功能是否等效？答案

【问题标题】：Are these functions equivalent?这些功能是否等效？
【发布时间】：2023-04-04 07:56:01
【问题描述】：

我正在构建一个使用T-distribution noise 的神经网络。我正在使用在 numpy 库 np.random.standard_t 中定义的函数和在 tensorflow tf.distributions.StudentT 中定义的函数。第一个函数的文档链接是here，第二个函数的文档链接是here。我正在使用如下所述的功能：

a = np.random.standard_t(df=3, size=10000)  # numpy's function

t_dist = tf.distributions.StudentT(df=3.0, loc=0.0, scale=1.0)
sess = tf.Session()
b = sess.run(t_dist.sample(10000))

在为 Tensorflow 实现提供的文档中，有一个名为 scale 的参数，其描述为

分布的比例因子。请注意，规模在技术上并不是该分布的标准差，但其语义更类似于标准差而不是方差。

我已将scale 设置为1.0，但我无法确定它们是否指的是同一个分布。

有人可以帮我验证一下吗？谢谢

【问题讨论】：

标签： python numpy tensorflow distribution

【解决方案1】：

我会说它们是，因为它们的采样在两种情况下都以几乎完全相同的方式定义。 tf.distributions.StudentT 的采样是这样定义的：

def _sample_n(self, n, seed=None):
  # The sampling method comes from the fact that if:
  #   X ~ Normal(0, 1)
  #   Z ~ Chi2(df)
  #   Y = X / sqrt(Z / df)
  # then:
  #   Y ~ StudentT(df).
  seed = seed_stream.SeedStream(seed, "student_t")
  shape = tf.concat([[n], self.batch_shape_tensor()], 0)
  normal_sample = tf.random.normal(shape, dtype=self.dtype, seed=seed())
  df = self.df * tf.ones(self.batch_shape_tensor(), dtype=self.dtype)
  gamma_sample = tf.random.gamma([n],
                                 0.5 * df,
                                 beta=0.5,
                                 dtype=self.dtype,
                                 seed=seed())
  samples = normal_sample * tf.math.rsqrt(gamma_sample / df)
  return samples * self.scale + self.loc  # Abs(scale) not wanted.

所以它是一个标准正态样本除以带有参数df 的卡方样本的平方根除以df。卡方样本被视为具有参数0.5 * df 和速率0.5 的伽玛样本，这是等价的（卡方是伽玛的一种特殊情况）。 scale 值，就像 loc 一样，只在最后一行发挥作用，作为在某个点和规模上“重新定位”分布样本的一种方式。当scale 为1 而loc 为0 时，它们什么都不做。

这是np.random.standard_t的实现：

double legacy_standard_t(aug_bitgen_t *aug_state, double df) {
  double num, denom;

  num = legacy_gauss(aug_state);
  denom = legacy_standard_gamma(aug_state, df / 2);
  return sqrt(df / 2) * num / sqrt(denom);
})

所以本质上是一样的，稍微改写一下。在这里，我们还有一个形状为df / 2 的伽马，但它是标准的（等级一）。但是，缺少的0.5 现在按分子为/ 2 在sqrt 中。所以它只是移动数字。不过这里没有scale 或loc。

事实上，不同之处在于 TensorFlow 的分布确实是 noncentral t-distribution。一个简单的经验证明它们对于 loc=0.0 和 scale=1.0 是相同的，就是绘制两个分布的直方图，看看它们看起来有多接近。

import numpy as np
import tensorflow as tf
import matplotlib.pyplot as plt
np.random.seed(0)
t_np = np.random.standard_t(df=3, size=10000)
with tf.Graph().as_default(), tf.Session() as sess:
    tf.random.set_random_seed(0)
    t_dist = tf.distributions.StudentT(df=3.0, loc=0.0, scale=1.0)
    t_tf = sess.run(t_dist.sample(10000))
plt.hist((t_np, t_tf), np.linspace(-10, 10, 20), label=['NumPy', 'TensorFlow'])
plt.legend()
plt.tight_layout()
plt.show()

输出：

看起来很接近。显然，从统计样本的角度来看，这不是任何一种证明。如果您仍然不相信，可以使用一些统计工具来测试一个样本是来自某个分布还是两个样本来自同一分布。

【讨论】：

感谢您的详细解释。我正在寻找一个经验证明，你已经解决了。