【发布时间】:2026-01-19 19:40:02
【问题描述】:
我在一些虚拟数据上使用 Poisson GLM 来根据两个变量(频率和司法方向)预测 ClaimCounts。
虚拟数据框:
data5 <-data.frame(Year=c("2006","2006","2006","2007","2007","2007","2008","2009","2010","2010","2009","2009"),
JudicialOrientation=c("Defense","Plaintiff","Plaintiff","Neutral","Defense","Plaintiff","Defense","Plaintiff","Neutral","Neutral","Plaintiff","Defense"),
Frequency=c(0.0,0.06,.07,.04,.03,.02,0,.1,.09,.08,.11,0),
ClaimCount=c(0,5,10,3,4,0,7,8,15,16,17,12),
Loss = c(100000,100,2500,100000,25000,0,7500,5200, 900,100,0,50),
Exposure=c(10,20,30,1,2,4,3,2,1,54,12,13)
)
模型 GLM:
ClaimModel <- glm(ClaimCount~JudicialOrientation+Frequency
,family = poisson(link="log"), offset=log(Exposure), data = data5, na.action=na.pass)
Call:
glm(formula = ClaimCount ~ JudicialOrientation + Frequency, family = poisson(link = "log"),
data = data5, na.action = na.pass, offset = log(Exposure))
Deviance Residuals:
Min 1Q Median 3Q Max
-3.7555 -0.7277 -0.1196 2.6895 7.4768
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -0.3493 0.2125 -1.644 0.1
JudicialOrientationNeutral -3.3343 0.5664 -5.887 3.94e-09 ***
JudicialOrientationPlaintiff -3.4512 0.6337 -5.446 5.15e-08 ***
Frequency 39.8765 6.7255 5.929 3.04e-09 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for poisson family taken to be 1)
Null deviance: 149.72 on 11 degrees of freedom
Residual deviance: 111.59 on 8 degrees of freedom
AIC: 159.43
Number of Fisher Scoring iterations: 6
我也在使用 Exposure 的偏移量。
然后我想使用这个 GLM 来预测相同观察的索赔计数:
data5$ExpClaimCount <- predict(ClaimModel, newdata=data5, type="response")
如果我理解正确,那么泊松 glm 方程应该是:
ClaimCount = exp(-.3493 + -3.3343*JudicialOrientationNeutral + -3.4512*JudicialOrientationPlaintiff + 39.8765*Frequency + log(Exposure))
但是我手动尝试了这个(In excel =EXP(-0.3493+0+0+LOG(10)) for observation 1 for example) 并进行了一些观察,但没有得到正确的答案。
我对 GLM 方程的理解不正确吗?
【问题讨论】:
-
您可能会看到不同的结果,因为 Excel 中的
LOG是以 10 为底的对数。尝试改用LN。 -
@tkmckenzie Excatlyl 在 R 中默认为
log(x, base = exp(1))。
标签: r offset glm predict poisson