假设你做了这样的事情:
import numpy as np
from sklearn.mixture import GaussianMixture
# create data
rng = np.random.RandomState(seed=42)
X = np.concatenate([rng.normal(0, 1, 100),
rng.normal(10, 3, 100),
rng.normal(30, 2, 100)]).reshape(-1, 1)
# estimate probability density function (pdf)
model = GaussianMixture(n_components=2)
model.fit(X)
x = np.linspace(-10, 40, 1000)
logprob = model.score_samples(x.reshape(-1, 1))
pdf = np.exp(logprob)
然后你可以通过简单地对估计的密度值取累积和,并对其进行缩放以使最大值为 1 来获得累积分布函数:
import matplotlib.pyplot as plt
# derive cumulative distribution function (cdf)
cdf = np.cumsum(pdf)
# scale as a probability distribution
cdf = cdf / np.max(cdf)
# plot data and pdf
plt.hist(X, 25, density=True, histtype='stepfilled', alpha=0.3)
plt.plot(x, pdf, '-k')
# plot cdf, scaled to the y limits of the above plot
xmin, xmax, ymin, ymax = plt.axis()
plt.plot(x, cdf * ymax, '-b');