模拟 CDF 曲线以进行渗透/采用外推答案

【问题标题】：Simulate CDF curve for penetration/adoption extrapolation模拟 CDF 曲线以进行渗透/采用外推
【发布时间】：2021-05-23 21:18:20
【问题描述】：

我希望能够为正态分布绘制一条像 cumulative distribution function 这样的线，因为它对于模拟采用曲线很有用：

具体来说，我希望能够使用初始数据（产品采用百分比）来推断该曲线的其余部分会是什么样子，从而粗略估计每个阶段的时间表。因此，例如，如果我们在 30 天达到 10% 的渗透率，在 40 天达到 20% 的渗透率，并且我们试图拟合这条曲线，我想知道我们什么时候能够达到 80% 的渗透率（对比另一个可能需要 50 天才能达到 10% 渗透率的人群）。

所以，我的问题是，我该怎么做呢？理想情况下，我能够提供初始数据（时间和渗透），并使用 python（例如 matplotlib）为我绘制图表的其余部分。但我不知道从哪里开始！谁能指出我正确的方向？

（顺便说一句，我也在CrossValidated 上发布了这个问题，但我不确定它是否属于那里，因为它是一个统计问题，或者这里，因为它是一个python 问题。抱歉重复！）

【问题讨论】：

标签： python matplotlib statistics distribution extrapolation

【解决方案1】：

cdf 可以通过scipy.stats.norm.cdf() 计算。它的ppf 可用于帮助映射所需的对应关系。然后scipy.interpolate.pchip 可以创建一个函数来使转换平滑插值。

import matplotlib.pyplot as plt
from matplotlib.ticker import PercentFormatter
import numpy as np
from scipy.interpolate import pchip  # monotonic cubic interpolation
from scipy.stats import norm

desired_xy = np.array([(30, 10), (40, 20)])  # (number of days, percentage adoption)
# desired_xy = np.array([(0, 1), (30, 10), (40, 20), (90, 99)])
labels = ['Innovators', 'Early\nAdopters', 'Early\nMajority', 'Late\nMajority', 'Laggards']
xmin, xmax = 0, 90  # minimum and maximum day on the x-axis

px = desired_xy[:, 0]
py = desired_xy[:, 1] / 100

# smooth function that transforms the x-values to the  corresponding spots to get the desired y-values
interpfunc = pchip(px, norm.ppf(py))

fig, ax = plt.subplots(figsize=(12, 4))
# ax.scatter(px, py, color='crimson', s=50, zorder=3)  # show desired correspondances
x = np.linspace(xmin, xmax, 1000)
ax.plot(x, norm.cdf(interpfunc(x)), lw=4, color='navy', clip_on=False)

label_divs = np.linspace(xmin, xmax, len(labels) + 1)
label_pos = (label_divs[:-1] + label_divs[1:]) / 2
ax.set_xticks(label_pos)
ax.set_xticklabels(labels, size=18, color='navy')
min_alpha, max_alpha = 0.1, 0.4
for p0, p1, alpha in zip(label_divs[:-1], label_divs[1:], np.linspace(min_alpha, max_alpha, len(labels))):
    ax.axvspan(p0, p1, color='navy', alpha=alpha, zorder=-1)
    ax.axvline(p0, color='white', lw=1, zorder=0)
ax.axhline(0, color='navy', lw=2, clip_on=False)
ax.axvline(0, color='navy', lw=2, clip_on=False)
ax.yaxis.set_major_formatter(PercentFormatter(1))
ax.set_xlim(xmin, xmax)
ax.set_ylim(0, 1)
ax.set_ylabel('Total Adoption', size=18, color='navy')
ax.set_title('Adoption Curve', size=24, color='navy')
for s in ax.spines:
    ax.spines[s].set_visible(False)
ax.tick_params(axis='x', length=0)
ax.tick_params(axis='y', labelcolor='navy')
plt.tight_layout()
plt.show()

仅对desired_xy 使用两个点，曲线将被线性拉伸。如果给出更多点，将应用平滑变换。这是[(0, 1), (30, 10), (40, 20), (90, 99)] 的样子。请注意，0 % 和 100 % 会导致问题，因为它们位于正无穷大处的负数处。

【讨论】：

这正是我想要的——非常清晰易懂。太感谢了。我真的很感激！