【发布时间】:2021-02-27 11:30:02
【问题描述】:
问题
为什么逻辑对数损失函数f(x) = -np.log(1.0 - __sigmoid(x)) 的数值梯度(f(x+k)-f(x-k)) / 2k 发散而-np.log(__sigmoid(x)) 不发散?潜在的案例和机制是什么,还是我犯了错误?代码在底部。
任何关于如何实现数值梯度的建议、更正、见解、资源参考或建议/提示/提示都将不胜感激。
背景
尝试实现逻辑对数损失函数的数值梯度(f(x+k)-f(x-k)) / 2k。图中y是二进制真/假标签T,p是激活sigmoid(x)。
当k比较大如1e-5时,问题不会发生,至少在x的范围内。
但是,当k 变小时,例如1e-08、-np.log(1.0 - __sigmoid(x)) 开始分道扬镳。但是,-np.log(__sigmoid(x)) 不会发生这种情况。
想知道减去1.0 - sigmoid(x) 是否与在binary manner 中的计算机中如何存储和计算浮点数有关。
尝试使k 更小的原因是通过添加一个小数字u 来防止log(0) 变为np.inf,例如1e-5 但log(x+1e-5) 导致数值梯度与分析梯度的偏差。为了尽量减少影响,我尽量减少影响并开始遇到这个问题。
代码
Logistic 对数损失和分析梯度。
import numpy as np
import inspect
from itertools import product
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
def __sigmoid(X):
return 1 / (1 + np.exp(-1 * X))
def __logistic_log_loss(X: np.ndarray, T: np.ndarray):
return -(T * np.log(__sigmoid(X)) + (1-T) * np.log(1-__sigmoid(X)))
def __logistic_log_loss_gradient(X, T):
Z = __sigmoid(X)
return Z-T
N = 1000
left=-20
right=20
X = np.linspace(left,right,N)
T0 = np.zeros(N)
T1 = np.ones(N)
# --------------------------------------------------------------------------------
# T = 1
# --------------------------------------------------------------------------------
fig, ax = plt.subplots(figsize=(8,6))
ax.plot(
X,
__logistic_log_loss(X, T1),
color='blue', linestyle='solid',
label="logistic_log_loss(X, T=1)"
)
ax.plot(
X,
__logistic_log_loss_gradient(X, T1),
color='navy', linestyle='dashed',
label="Analytical gradient(T=1)"
)
# --------------------------------------------------------------------------------
# T = 0
# --------------------------------------------------------------------------------
ax.plot(
X,
__logistic_log_loss(X, T0),
color='magenta', linestyle='solid',
label="logistic_log_loss(X, T=0)"
)
ax.plot(
X,
__logistic_log_loss_gradient(X, T0),
color='purple', linestyle='dashed',
label="Analytical gradient(T=0)"
)
ax.set_xlabel("X")
ax.set_ylabel("dL/dX")
ax.set_title("Logistic log loss and gradient")
ax.legend()
ax.grid(True)
数值梯度
def t_0_loss(X):
return [
#logistic_log_loss(P=sigmoid(x), T=0)
-np.log(1.0 - __sigmoid(x)) for x in X
]
def t_1_loss(X):
return [
#logistic_log_loss(P=sigmoid(x), T=1)
-np.log(__sigmoid(x)) for x in X
]
N = 1000
left=-1
right=15
# Numerical gradient
# (f(x+k)-f(x-k)) / 2k
k = 1e-9
X = np.linspace(left,right,N)
fig, axes = plt.subplots(1, 2, figsize=(10,8))
# --------------------------------------------------------------------------------
# T = 0
# --------------------------------------------------------------------------------
axes[0].plot(
X,
((np.array(t_0_loss(X + k)) - np.array(t_0_loss(X - k))) / (2*k)),
color='red', linestyle='solid',
label="Diffed numerical gradient(T=0)"
)
axes[0].plot(
X[0:-1:20],
((np.array(t_0_loss(X + k)) - np.array(t_0_loss(X))) / k)[0:-1:20],
color='black', linestyle='dotted', marker='x', markersize=4,
label="Left numerical gradient(T=0)"
)
axes[0].plot(
X[0:-1:20],
((np.array(t_0_loss(X)) - np.array(t_0_loss(X - k))) / k)[0:-1:20],
color='salmon', linestyle='dotted', marker='o', markersize=5,
label="Right numerical gradient(T=0)"
)
axes[0].set_xlabel("X")
axes[0].set_ylabel("dL/dX")
axes[0].set_title("T=0: -log(1-sigmoid(x))")
axes[0].legend()
axes[0].grid(True)
# --------------------------------------------------------------------------------
# T = 1
# --------------------------------------------------------------------------------
axes[1].plot(
X,
((np.array(t_1_loss(X + k)) - np.array(t_1_loss(X - k))) / (2*k)),
color='blue', linestyle='solid',
label="Diffed numerical gradient(T=1)"
)
axes[1].plot(
X[0:-1:20],
((np.array(t_1_loss(X + k)) - np.array(t_1_loss(X))) / k)[0:-1:20],
color='cyan', linestyle='dashed', marker='x', markersize=5,
label="Left numerical gradient(T=1)"
)
axes[1].plot(
X[0:-1:20],
((np.array(t_1_loss(X)) - np.array(t_1_loss(X - k))) / k)[0:-1:20],
color='yellow', linestyle='dotted', marker='o', markersize=5,
label="Right numerical gradient(T=1)"
)
axes[1].set_xlabel("X")
axes[1].set_ylabel("dL/dX")
axes[1].set_title("T=1: -log(sigmoid(x)")
axes[1].legend()
axes[1].grid(True)
【问题讨论】:
-
我建议使用
f(x) = -np.log1p( - __sigmoid(x))这正是 log1p 的用途......
标签: python numpy floating-point