百分位数必须在 [0, 100] 范围内答案

【问题标题】：Percentile must be in the range [0, 100]百分位数必须在 [0, 100] 范围内
【发布时间】：2020-06-24 07:20:12
【问题描述】：

下面显示的代码只是我正在工作的一个大项目的一个sn-p。

O = stats.scoreatpercentile(dfx[dfx['outlier'] == 1]['column_name'], np.abs(threshold))
l = stats.scoreatpercentile(dfx[dfx['outlier'] == 0]['column_name'], np.abs(threshold))
Data = stats.scoreatpercentile(dfx['column_name'], np.abs(threshold))
O, l, Data

不幸的是，我收到以下错误

ValueError: percentile must be in the range [0, 100]

我已经做了一些研究，但对于这个错误没有太多帮助

【问题讨论】：

你的程序中threshold的值是多少？
@Guimute，刚刚检查了一下，发现它比 100 大得多，所以现在我被卡住了，不知道如何解决这个问题，考虑到阈值应该保持这样
那你想要哪个百分位？ 100 * threshold / max value of your data 也许？
由于您的threshold 可以超过100--您在对答案的评论中说它可以高达6400--我想知道您是否使用了正确的功能。你真的了解scoreatpercentile 的作用吗？

标签： python pandas numpy csv scipy

【解决方案1】：

scoreatpercentile 的第二个参数应该在 0 和 100 之间，所以我猜测对于某些值，threshold 小于 -100 或大于 100。

这是因为百分位数的数学意义，什么是百分位数200？在这种情况下，一个可能的“解决方案”是将其上的所有值映射到 100，您可以这样做：

def get_perc(threshold):
    perc = np.abs(threshold)
    return 100 if perc>100 else perc

O = stats.scoreatpercentile(dfx[dfx['outlier'] == 1]['column_name'], get_perc(threshold))
l = stats.scoreatpercentile(dfx[dfx['outlier'] == 0]['column_name'], get_perc(threshold))
Data = stats.scoreatpercentile(dfx['column_name'], get_perc(threshold))
O, l, Data

在执行此操作之前，我建议您先明确百分位数的概念，然后您才能确定这是否适合您。我找到了this article，它的解释非常简单，或者您可以查看Wikipedia。

【讨论】：

如果阈值大于 100，有没有办法越过这个？
这更多的是关于百分位数的数学意义，什么是百分位数200？我会为我的答案添加一种可能的解决方案
@Gamopi，在这种情况下我的阈值是6400，所以远高于100，并且由于我有大量数据，将来我可能会得到可能高于此的不同阈值与合作
也许我要求的太多了，但您能否在我的代码中实现您的方法？

【解决方案2】：

第二个参数：np.abs(threshold) 必须在 0 到 100 之间。更多参考 [this]:https://docs.scipy.org/doc/numpy-1.12.0/reference/generated/numpy.percentile.html 可以提供帮助

【讨论】：