【问题标题】:Calculating percentile based on binning input根据分箱输入计算百分位数
【发布时间】:2021-10-25 13:49:20
【问题描述】:

您好目前我想尝试创建一个函数来根据分箱输入计算百分位数, 说我有来自直方图的数据集。

given:  
hist = [10, 15, 4]   
edges = [0.5, 6, 12, 25]  
perc = 5

我想根据 perc 的分箱返回百分位数,所以返回是这样的

perc = 5
return percentile(data,0),
percentile(data,.25),
percentile(data,50),
percentile(data,75),
percentile(data,100)

输出:[0.5, 4.4875, 7.8, 10.7, 25]

我曾尝试使用 pandas.qcut(data,perc) 但似乎剪切不正确

【问题讨论】:

  • 请提供一个最小且完整的问题示例,并解释为什么您的解决方案没有提供预期的结果。其实你的问题还不清楚。

标签: python numpy percentile


【解决方案1】:

如果我理解正确,这应该可以:

def percentile_binning(hist, edges, percentages):
    hist_cumsum = np.cumsum(hist)
    hist_sum = hist_cumsum[-1]
    hist_cumsum_norm = hist_cumsum / hist_sum
    indxs = np.digitize(percentages, hist_cumsum_norm)
    bins_reduction = np.append(np.array([0]), hist_cumsum)[indxs]
    vals_between_edged = percentages * hist_sum - bins_reduction
    edged_diff = edges[1:] - edges[:-1]
    edged_diff = np.append(edged_diff, 0)
    percentage_diff = edged_diff[indxs]
    percentage_edge_value = edges[indxs]
    percentage_bins_sizes = np.append(hist, hist_sum)[indxs]
    result = percentage_diff * vals_between_edged / percentage_bins_sizes + percentage_edge_value
    return result

输入:

hist = np.array([10, 15, 4])
edges = np.array([0.5, 6, 12, 25])
percentages = np.array([0, 0.25, 0.5, 0.75, 1])
print(percentile_binning(hist, edges, percentages))

输出:

[ 0.5     4.4875  7.8    10.7    25.    ]

【讨论】:

    猜你喜欢
    • 2017-01-11
    • 2022-07-21
    • 2011-12-29
    • 2013-06-20
    • 2019-06-05
    • 1970-01-01
    • 1970-01-01
    • 2021-02-26
    • 2016-07-28
    相关资源
    最近更新 更多