一个标准是包含一个防止被零除的 epsilon 变量。理论上,它是不需要的,因为进行这样的计算在逻辑上没有意义。实际上,机器只是计算器,除以零变成 NaN 或 +/-Inf。
简而言之,像这样定义你的函数:
def z_norm(arr, epsilon=1e-100):
return (arr-arr.mean())/(arr.std()+epsilon)
这假定为一维数组,但很容易更改为二维数组的按行或按列计算。
Epsilon 是故意添加到计算中的错误,以防止创建 NaN 或 Inf。在 Inf 的情况下,您仍然会得到非常大的数字,但以后的计算不会传播 Inf 并且可能仍然保留一些含义。
1/(1 x 10^100) 的值非常小,不会对结果产生太大影响。如果需要,您可以降至 1e-300 左右,但在进一步计算后您可能会遇到最低精度值。请注意您使用的精度和它可以处理的最小精度。我使用的是 float64。
2021-11-03 更新:添加测试代码。此 epsilon 的目标是最大程度地减少损坏并消除数据管道中随机 NaN 的机会。将 epsilon 设置为正值可以解决问题。
for arr in [
np.array([0,0]),
np.array([1e-300,1e-300]),
np.array([1,1]),
np.array([1,2])
]:
for epi in [1e-100,0,1e100]:
stdev = arr.std()
mean = arr.mean()
result = z_norm(arr, epsilon=epi)
print(f' z_norm(np.array({str(arr):<21}),{epi:<7}) ### stdev={stdev}; mean={mean:<6}; becomes --> {str(result):<19} (float-64) --> Truncate to 32 bits. =', result.astype(np.float32))
z_norm(np.array([0 0] ),1e-100 ) ### stdev=0.0; mean=0.0 ; becomes --> [0. 0.] (float-64) --> Truncate to 32 bits. = [0. 0.]
z_norm(np.array([0 0] ),0 ) ### stdev=0.0; mean=0.0 ; becomes --> [nan nan] (float-64) --> Truncate to 32 bits. = [nan nan]
z_norm(np.array([0 0] ),1e+100 ) ### stdev=0.0; mean=0.0 ; becomes --> [0. 0.] (float-64) --> Truncate to 32 bits. = [0. 0.]
z_norm(np.array([1.e-300 1.e-300] ),1e-100 ) ### stdev=0.0; mean=1e-300; becomes --> [0. 0.] (float-64) --> Truncate to 32 bits. = [0. 0.]
z_norm(np.array([1.e-300 1.e-300] ),0 ) ### stdev=0.0; mean=1e-300; becomes --> [nan nan] (float-64) --> Truncate to 32 bits. = [nan nan]
z_norm(np.array([1.e-300 1.e-300] ),1e+100 ) ### stdev=0.0; mean=1e-300; becomes --> [0. 0.] (float-64) --> Truncate to 32 bits. = [0. 0.]
z_norm(np.array([1 1] ),1e-100 ) ### stdev=0.0; mean=1.0 ; becomes --> [0. 0.] (float-64) --> Truncate to 32 bits. = [0. 0.]
z_norm(np.array([1 1] ),0 ) ### stdev=0.0; mean=1.0 ; becomes --> [nan nan] (float-64) --> Truncate to 32 bits. = [nan nan]
z_norm(np.array([1 1] ),1e+100 ) ### stdev=0.0; mean=1.0 ; becomes --> [0. 0.] (float-64) --> Truncate to 32 bits. = [0. 0.]
z_norm(np.array([1 2] ),1e-100 ) ### stdev=0.5; mean=1.5 ; becomes --> [-1. 1.] (float-64) --> Truncate to 32 bits. = [-1. 1.]
z_norm(np.array([1 2] ),0 ) ### stdev=0.5; mean=1.5 ; becomes --> [-1. 1.] (float-64) --> Truncate to 32 bits. = [-1. 1.]
z_norm(np.array([1 2] ),1e+100 ) ### stdev=0.5; mean=1.5 ; becomes --> [-5.e-101 5.e-101] (float-64) --> Truncate to 32 bits. = [-0. 0.]