【问题标题】:Pymc Linear Regression starting issues (scaling input params?)Pymc 线性回归开始问题(缩放输入参数?)
【发布时间】:2014-03-05 08:32:54
【问题描述】:

this example 一起使用 PYMC3 进行非常简单的贝叶斯线性回归(我希望学习)我运行了最初的示例,然后尝试使用我自己的数据并得到:

ValueError: Optimization error: max, logp or dlogp at max have non-finite values. 
Some values may be outside of distribution support. max: {'alpha': array(50000.0), 
'beta': array(50000.0), 'sigma': array(25000.0)} logp: array(nan) dlogp: array([ nan,
nan,  nan])Check that 1) you don't have hierarchical parameters, these will lead to 
points with infinite density. 2) your distribution logp's are properly specified. 
Specific issues:

怀疑是由于我的数据范围,但很可能是我不了解其他参数。数据和代码如下:我希望这应该只在 IPython notebook 中运行。最后一个应该预测单位,当一切都说完了。

import pandas as pd
import io
content2 = '''\
Units   lastqu
2000-12-31   19391   NaN
2001-12-31   35068   5925
2002-12-31   39279   8063
2003-12-31   47517   9473
2004-12-31   51439   11226
2005-12-31   59674   11667
2006-12-31   58664   14016
2007-12-31   55698   13186
2008-12-31   42235   11343
2009-12-31   40478   7867
2010-12-31   38722   8114
2011-12-31   36965   8361
2012-12-31   39132   8608
2013-12-31   43160   9016
2014-12-31   NaN     9785
'''
df2 = pd.read_table(io.BytesIO(content2))
#make sure that the columns are int, it is all a DataFrame
df2['Units']=df2['Units'][:-1].astype('int')
df2['lastqu']=df2['lastqu'][1:].astype('int')
df2

而我试过的型号代码是:

import pymc as pm
#import numpy as np
x=df2['lastqu']               <<<< my best guess as to how to specify my data
y=df2['Units']
trace = None
with pm.Model() as model:
    alpha = pm.Normal('alpha', mu=0, sd=20)
    beta = pm.Normal('beta', mu=0, sd=20)
    sigma = pm.Uniform('sigma', lower=0, upper=50000)

    y_est = alpha + beta * x

    likelihood = pm.Normal('y', mu=y_est, sd=sigma, observed=y)

    start = pm.find_MAP()
    step = pm.NUTS(state=start)
    trace = pm.sample(2000, step, start=start, progressbar=False)

    pm.traceplot(trace);

【问题讨论】:

  • 最初的错误,似乎是我包含了 NaN 值,我剪切了框架以在两列中排除 NaN。它运行了一段时间,但出现了某种 Theano 错误,我现在对此很钦佩.... /usr/local/lib/python2.7/dist-packages/theano/scan_module/scan_perform_ext.py:85: RuntimeWarning: numpy.ndarray 大小已更改,可能表明从 scan_perform.scan_perform 导入二进制不兼容 * 想知道我是否也需要使用 Git 版本的 Theano??
  • 代码与添加 %matplotlib inline Doh 一起工作!

标签: python pandas bayesian pymc


【解决方案1】:

这行得通:

df2=df2[1:-1]          <<<< gets rid of NaN from example data
df2
%matplotlib inline
import pymc as pm
#import numpy as np
x=df2['lastqu']               <<<< my best guess as to how to specify my data
y=df2['Units']
trace = None
with pm.Model() as model:
    alpha = pm.Normal('alpha', mu=0, sd=20)
    beta = pm.Normal('beta', mu=0, sd=20)
    sigma = pm.Uniform('sigma', lower=0, upper=50000)

    y_est = alpha + beta * x

    likelihood = pm.Normal('y', mu=y_est, sd=sigma, observed=y)

    start = pm.find_MAP()
    step = pm.NUTS(state=start)
    trace = pm.sample(2000, step, start=start, progressbar=False)

    pm.traceplot(trace);

再次感谢@fonnesbeck!

【讨论】:

    猜你喜欢
    • 2014-05-20
    • 2020-09-19
    • 2015-01-17
    • 1970-01-01
    • 2014-02-05
    • 2018-11-17
    • 2019-10-15
    • 2018-12-14
    • 2014-03-06
    相关资源
    最近更新 更多