【问题标题】:Finding the mean and standard deviation of a distribution given its name and parameters在给定名称和参数的情况下查找分布的均值和标准差
【发布时间】:2019-11-23 18:42:34
【问题描述】:

我使用代码从scipy.stats 的虹膜数据集生成了以下内容

import scipy.stats as st
def get_best_distribution(data):
    dist_names = ["norm", "exponweib", "weibull_max", "weibull_min", "pareto", "genextreme"]
    dist_results = []
    params = {}
    for dist_name in dist_names:
        dist = getattr(st, dist_name)
        param = dist.fit(data)

        params[dist_name] = param
        # Applying the Kolmogorov-Smirnov test
        D, p = st.kstest(data, dist_name, args=param)
        print("p value for "+dist_name+" = "+str(p))
        dist_results.append((dist_name, p))

    # select the best fitted distribution
    best_dist, best_p = (max(dist_results, key=lambda item: item[1]))
    # store the name of the best fit and its p value

    print("Best fitting distribution: "+str(best_dist))
    print("Best p value: "+ str(best_p))
    print("Parameters for the best fit: "+ str(params[best_dist]))

    return best_dist, best_p, params[best_dist]

How to find probability distribution and parameters for real data? (Python 3)获得:

Best fitting distribution: invgauss
Best p value: 0.8268700800511397
Parameters for the best fit: (0.016421213754032188, 1.5064355144322001, 309.4166651914064)

best_result = {"virginica": {"distribution": "invgauss", "parameters": [0.016421213754032188, 1.5064355144322001, 309.4166651914064]}}

我现在想从best_result 获得均值和标准差(分别为方差)。在Distribution mean and standard deviation using scipy.stats 上查找了类似的内容,但我无法弄清楚如何使用 SciPy 做到这一点..

一些见解将不胜感激!

【问题讨论】:

    标签: python python-3.x scipy distribution


    【解决方案1】:

    保存分布对象,而不是保存分布的名称。为此,请更改

            dist_results.append((dist_name, p))
    

            dist_results.append((dist, p))
    

    然后把函数中的三个print语句和return语句改成

        print("Best fitting distribution:", best_dist.name)
        print("Best p value: "+ str(best_p))
        print("Parameters for the best fit:", params[best_dist.name])
    
        return best_dist, best_p, params[best_dist.name]
    

    那么你可以这样做:

    dist, p, par = get_best_distribution(data)
    
    print("mean:", dist.mean(*par))
    print("std: ", dist.std(*par))
    

    【讨论】:

    • 感谢您的宝贵意见/建议!分发对象的调用直接完全让我忘记了:p
    • 我也可以按照您的建议获得平均值和标准差。
    最近更新 更多