【问题标题】:Python: Calculate mean and sd of normal distribution in intervallPython:计算区间内正态分布的均值和标准差
【发布时间】:2017-11-03 08:00:51
【问题描述】:

我的问题是:

我有一个间隔/多个间隔,可以说:

[0;0.3] [0.3;0,8] [0.8;1]

在每个区间我都有一个正态分布,采样 truncnorm() and .rvs()

所以我在 x 轴上有多个“正态分布”。

但是 truncnorm 方法需要区间内分布的均值和标准差。 python中如何计算特定区间的均值和sd???

numpy.mean() f.e.似乎不起作用。而且我得到了奇怪的结果,所以我认为在执行 truncnorm 之前我的均值/标准差计算错误。

谢谢大家

*编辑:对于其他列,间隔不是那么小,它工作正常。 Intervall 的大小是否有限制?错误发生 f.e.间隔

[0,12;0,17]--> 值 0,0937818650369(超出范围)*

是的,当然。 我想要做的是:我有一个区间,给我采样一个值,它位于该区间的边界之间,并以截断正态分布的方式对其进行简单化。我有一个额外的列,它应该写下我通过在另一列中采样获得的值。 例如:Intervall [0.2;0.6] --> 样本值 0.343433 我想我找到了解决方案:

truncnorm().stats()

但我不知道为什么,但是对于我给出的参数

truncnorm() 

函数,我获得的价值中几乎 50% 都在边界之外。我做错了什么?

这里是代码(一小部分代码)

      convert_cat=(name_convert_column,name_convert_column,_tabelle,name_convert_column,_tabelle,_tabelle,name_convert_column)
    drop_view=(name_convert_column)
    calculate=(name_convert_column,name_convert_column,name_convert_column,name_convert_column,name_convert_column,_tabelle,name_convert_column,name_convert_column)
    cur.execute("CREATE VIEW convert_cat_%s (quotient, %s, rnum) AS SELECT (COUNT(*)/(SELECT COUNT(*) FROM %s ) ) as quotient, %s, row_number() over ( order by (COUNT(*)/(SELECT COUNT(*) FROM %s ) ) desc ) as rnum FROM     %s  GROUP BY %s ORDER BY quotient desc" %convert_cat)
    cur.execute("Select b.ID,a.unten,a.oben, a.mean, a.sd FROM( SELECT t3.RNUM, t3.%s, lag(t3.com_Pr,1,0) OVER (order by rnum asc) as unten , t3.com_PR as oben, ((t3.com_PR +(lag(t3.com_Pr,1,0) OVER (order by rnum asc)))/2) as MEAN, ((t3.com_PR-(lag(t3.com_Pr,1,0) OVER (order by rnum asc)))/6) AS SD FROM( SELECT t1.rnum, t1.%s , SUM(t2.quotient) as com_Pr FROM CONVERT_CAT_%s t1 INNER JOIN CONVERT_CAT_%s t2 ON t1.rnum >= t2.rnum group by t1.rnum, t1.%s, t1.quotient ORDER BY RNUM asc ) t3) a INNER JOIN %s b ON b.%s = a.%s order by ID asc" %calculate)
    _content_category = cur.fetchall()
    add_category_number_column = (_tabelle, name_convert_column)
    cur.execute("ALTER TABLE %s ADD %s_category NUMBER(15,14)" % add_category_number_column)
    x=0
    for ID in _content_category:
        id = _content_category[0]
        id_category = [j[0] for j in _content_category]
        unten_category = [j[1] for j in _content_category]
        oben_category = [j[2] for j in _content_category]
        #mean_category = [j[3] for j in _content_category]
        sd_category = [j[4] for j in _content_category]
        mean, var = truncnorm.stats(unten_category[x], oben_category[x], moments='mv')
       # sd = np.sqrt(var)
        X = get_truncated_normal(mean= mean, sd=sd_category[x], low=unten_category[x], upp=oben_category[x])
        update_cells_value = float(X.rvs(1))
        category = (_tabelle, name_convert_column,update_cells_value,id_category[x])
     cur.execute("UPDATE %s SET %s_category = %s WHERE ID=%s" % category)

        x += 1

我尝试在 sql 查询中计算平均值和标准差

1) ((t3.com_PR +(lag(t3.com_Pr,1,0) OVER (order by rnum asc)))/2) as MEAN
 2) ((t3.com_PR-(lag(t3.com_Pr,1,0) OVER (order by rnum asc)))/6) AS SD

并与 truncnorm().stats() 函数。似乎使用 stats 函数,结果变得更糟,并且值比以前更超出范围......

【问题讨论】:

  • 你能分享一些最小的代码来解决你的问题吗?
  • 我做到了 :) 它现在在最初的帖子中......

标签: python normal-distribution


【解决方案1】:

尽管我无法运行您的示例,但可能存在一个问题:

 for ID in _content_category:
    id = _content_category[0]
    ...

最好是:

 for ID in _content_category:
    id = _content_category[ID]
    ...

【讨论】: