【问题标题】:Errors while trying to fit gamma distribution with R fitdistr{MASS}尝试使用 R fitdistr{MASS} 拟合伽马分布时出错
【发布时间】:2013-04-12 15:03:53
【问题描述】:

我在 R 中的 fitdistr{MASS} 函数有问题。我有这个向量:

a <- c(26,73,84,115,123,132,159,207,240,241,254,268,272,282,300,302,329,346,359,367,375,378, 384,452,475,495,503,531,543,563,594,609,671,687,691,716,757,821,829,885,893,968,1053,1081,1083,1150,1205,1262,1270,1351,1385,1498,1546,1565,1635,1671,1706,1820,1829,1855,1873,1914,2030,2066,2240,2413,2421,2521,2586,2727,2797,2850,2989,3110,3166,3383,3443,3512,3515,3531,4068,4527,5006,5065,5481,6046,7003,7245,7477,8738,9197,16370,17605,25318,58524)

我想用一个命令为数据拟合一个伽马分布:

fitted.gamma <- fitdistr(a, "gamma")

但我有这样的错误:

Error in optim(x = c(26, 73, 84, 115, 123, 132, 159, 207, 240, 241, 254,  : 
non-finite finite-difference value [1]
In addition: Warning messages:
1: In densfun(x, parm[1], parm[2], ...) : NaNs produced
2: In densfun(x, parm[1], parm[2], ...) : NaNs produced
3: In densfun(x, parm[1], parm[2], ...) : NaNs produced
4: In densfun(x, parm[1], parm[2], ...) : NaNs produced

所以我尝试初始化参数:

(fitted.gamma <- fitdistr(a, "gamma", start=list(1,1)))

对象 fit.gamma 已创建,但在打印时会产生错误:

Error in dn[[2L]] : subscript out of bounds

您知道正在发生什么,或者可能知道其他一些 R 函数来拟合 MLE 的单变量分布吗?

提前感谢您的任何帮助或回复。

库巴

【问题讨论】:

    标签: r gamma-distribution


    【解决方案1】:

    总是先绘制你的东西,你的缩放离你很远。

    library(MASS)
    a <- c(26,73,84,115,123,132,159,207,240,241,254,268,272,282,300,302,329,346,359,367,375,378, 384,452,475,495,503,531,543,563,594,609,671,687,691,716,757,821,829,885,893,968,1053,1081,1083,1150,1205,1262,1270,1351,1385,1498,1546,1565,1635,1671,1706,1820,1829,1855,1873,1914,2030,2066,2240,2413,2421,2521,2586,2727,2797,2850,2989,3110,3166,3383,3443,3512,3515,3531,4068,4527,5006,5065,5481,6046,7003,7245,7477,8738,9197,16370,17605,25318,58524)
    ## Ooops, rater wide
    plot(hist(a))
    fitdistr(a/10000,"gamma") # gives warnings
    # No warnings
    fitted.gamma <- fitdistr(a/10000, dgamma,  start=list(shape = 1, rate = 0.1),lower=0.001)
    

    现在您可以决定如何处理缩放

    【讨论】:

    • 感谢您的回复。我看到添加带有缩放的“较低”参数是诀窍。这是否意味着在优化 gamma 参数时会在某些时候取负值?当谈到缩放时,为什么需要缩放值(速率参数太低)?库巴
    • 是的,在梯度优化过程中,我们很容易遇到一些样本的坏梯度区域。 Gamma 可能不是正确的分布,请尝试绘制一些示例。但是,log(a) 看起来几乎正常...
    • 非常感谢您的帮助:)
    【解决方案2】:

    对于明显符合 gamma 分布但比例错误的数据(即,好像它被乘以/除以一个大数),这里有另一种拟合 gamma 分布的方法:

    fitgamma <- function(x) {
      # Equivalent to `MASS::fitdistr(x, densfun = "gamma")`, where x are first rescaled to 
      # the appropriate scale for a gamma distribution. Useful for fitting the gamma distribution to 
      # data which, when multiplied by a constant, follows this distribution
      if (!requireNamespace("MASS")) stop("Requires MASS package.")
    
      fit <- glm(formula = x ~ 1, family = Gamma)
      out <- MASS::fitdistr(x * coef(fit), "gamma")
      out$scaling_multiplier <- unname(coef(fit))
      out
    }
    

    用法:

    set.seed(40)
    test <- rgamma(n = 100, shape = 2, rate = 2)*50000
    fitdistr(test, "gamma") # fails
    dens_fit <- fitgamma(test) # successs
    curve(dgamma(x, 2, 2), to = 5) # true distribution
    curve(dgamma(x, dens_fit$estimate['shape'], dens_fit$estimate['rate']), add=TRUE, col=2) # best guess
    lines(density(test * dens_fit$scaling_multiplier), col = 3)
    

    【讨论】: