二进制值随机变量的局部加权平滑答案

【问题标题】：Locally weighted smoothing for binary valued random variable二进制值随机变量的局部加权平滑
【发布时间】：2017-07-14 14:10:31
【问题描述】：

我有一个随机变量如下：

f(x) = 1 概率为 g(x)

f(x) = 0 概率为 1-g(x)

其中 0

假设 g(x) = x。假设我在不知道函数 g 的情况下观察这个变量并获得了 100 个样本，如下所示：

import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import binned_statistic

list = np.ndarray(shape=(200,2))

g = np.random.rand(200)
for i in range(len(g)):
    list[i] = (g[i], np.random.choice([0, 1], p=[1-g[i], g[i]]))

print(list)
plt.plot(list[:,0], list[:,1], 'o')

Plot of 0s and 1s

现在，我想从这些点检索函数 g。我能想到的最好的方法是使用绘制直方图并使用平均统计量：

bin_means, bin_edges, bin_number = binned_statistic(list[:,0], list[:,1], statistic='mean', bins=10)
plt.hlines(bin_means, bin_edges[:-1], bin_edges[1:], lw=2)

Histogram mean statistics

相反，我想对生成函数进行连续估计。

我猜这是关于内核密度估计的，但我找不到合适的指针。

【问题讨论】：

你可以在Statsmodelssklearn找到kdes，scipy也有。如果您只想要一个情节，请查看seaborn，它是distplot 或kdeplot。但是为什么你想要一个二进制数据的 kde 呢？
@MarvinTaschenberger 我对 kde 的评论可能具有误导性。看来我有一个逻辑回归问题。 en.wikipedia.org/wiki/…。但我并不是想拟合一个模型。我想以流畅的方式绘制它。
这看起来也很相关：thestatsgeek.com/2014/09/13/…

标签： numpy scipy histogram logistic-regression probability-density

【解决方案1】：

直截了当，无需明确拟合估算器：

import seaborn as sns 
g = sns.lmplot(x= , y= , y_jitter=.02 , logistic=True)

插入x= 你的外生变量和类似的y = 因变量。如果您有很多数据点，y_jitter 是抖动点以获得更好的可见性。 logistic = True 是这里的重点。它将为您提供数据的逻辑回归线。

Seaborn 基本上是围绕 matplotlib 量身定制的，并且与 pandas 配合得很好，以防您想将数据扩展到 DataFrame。

【讨论】：

现在，我知道我正在寻找的是局部加权散点图平滑。谢谢指点sns。 df = pd.DataFrame() df['x'] = list[:,0] df['y'] = list[:,1] sns.lmplot(x='x', y='y', 数据= df, lowess=True) plt.show()