【发布时间】:2023-10-30 20:41:01
【问题描述】:
我正在 r 中分析树冠覆盖(OverheadCover,以 0,1 为界的比例)和放置在同一位置的屠体数量(CarcassNumber,具有 2 个水平的因子)对鸟类吃掉的腐肉的比例(ProportionBirdsScavenging,比例以 0,1 为界)。我通过为CarcassNumber 的单独值对OverheadCover 对ProportionBirdsScavenging 的影响进行建模来绘制这种交互,然后将其绘制在同一张图中。完成此操作后,我看到plot_model(glmm_interaction, type = "int") 计算的 SE 与我计算的 SE 存在差异。这里开始了调查。这段旅程带领我浏览了plot_model、plot_type_int、ggpredict 的源代码,并以ggpredict_helper 结束。不幸的是,我没有找到 SE 的计算方法,但我找到了 SE 差异的证据。我一直非常有信心我正确计算了我的 SE,但现在我不再那么确定了。请在下面查看我的代码。
#rm(list = ls())
library(glmmTMB)
library(dplyr)
data_both <- data.frame(ProportionBirdsScavenging = c(0.406192519926425, 0.871428571428571, 0.452995391705069, 0.484821428571429, 0.795866569978245, 0.985714285714286, 0.208571428571429, 0.573982970671712, 0.694285714285714, 0.930204081632653, 0.0483709273182957, 0.0142857142857143, 0.661904761904762, 0.985714285714286, 0.0142857142857143, 0.0142857142857143),
pointWeight = c(233, 17, 341, 128, 394, 46, 5, 302, 10, 35, 57, 39, 12, 229, 28, 116),
OverheadCover = c(0.671, 0.04, 0.46, 0.65, 0.02, 0, 0.8975, 0.585, 0.6795, 0.0418, 0.5995, 0.6545, 0.02, 0, 0.92, 0.585),
CarcassNumber = as.factor(c(1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2)),
Area = c("Hamert", "KempenBroek", "KempenBroek", "KempenBroek", "Markiezaat", "Markiezaat", "Meinweg", "Valkenhorst", "Hamert", "KempenBroek", "KempenBroek", "KempenBroek", "Markiezaat", "Markiezaat", "Meinweg", "Valkenhorst"))
data_both$pointWeight_scaled <- scales::rescale(data_both$pointWeight, to = c(0.0001,1)) # rescale weights
glmm_interaction <- glmmTMB(ProportionBirdsScavenging ~ OverheadCover * CarcassNumber + (1|Area), data = data_both, beta_family(link = "logit"), weights = pointWeight_scaled)
ggeffects::ggpredict(glmm_interaction, terms = c("OverheadCover", "CarcassNumber [1:2]")) # SE's calculated by r
# Calculate the SE's for CarcassNumber 1
df_first_carcasses <- filter(data_both, CarcassNumber == 1) # create df with only first carcasses
myglmm <- glmmTMB(ProportionBirdsScavenging ~ OverheadCover, data = df_first_carcasses, beta_family(link = "logit"), weights = pointWeight_scaled)
new.xglmm <- expand.grid(OverheadCover = seq(min(data_both$OverheadCover), max(data_both$OverheadCover), length.out = 1000)) %>%
mutate(Area = "Hamert", pointWeight_scaled = 1) # pad new.xglmm with an arbitrary value for Area and pointWeight_scaled, then exclude them in predict, otherwise error -> https://*.com/questions/54411851/mgcv-how-to-use-exclude-argument-in-predict-gam
new.yglmm <- data.frame(predict(myglmm, new.xglmm, type = "link", exclude = c("Area","pointWeight_scaled"), se.fit = TRUE)) %>% # exclude Area and pointWeight_scaled from the prediction
mutate(ProportionBirdsScavenging = plogis(fit)) %>% # calculate the ProportionBirdsScavenging on response scale
rename(SE.untransformed = se.fit, untransformed.predictions = fit)
addTheseglmm1 <- mutate(data.frame(new.xglmm, new.yglmm), # calculate the lwr and upr bounds using the untransformed predictions and SE, then transformed to probability
lwr = plogis(untransformed.predictions - SE.untransformed),
upr = plogis(untransformed.predictions + SE.untransformed))
addTheseglmm1[c(1,45,501,653,707,1000),c(1,6,5)] # compare my SE's to the ggpredict SE's
# calculated by ggpredict > manually calculated
# x | Predicted | SE | > SE.untransformed
# ------------------------- > -------------------
# 0.00 | 0.82 | 0.44 | > 0.43
# 0.04 | 0.80 | 0.41 | > 0.40
# 0.46 | 0.56 | 0.22 | > 0.21
# 0.60 | 0.47 | 0.26 | > 0.25
# 0.65 | 0.44 | 0.29 | > 0.28
# 0.92 | 0.28 | 0.47 | > 0.46
# ( CarcassNumber = 1 )
# Same for CarcassNumber 2
df_second_carcasses <- filter(data_both, CarcassNumber == 2) # create df with only second carcasses
myglmm <- glmmTMB(ProportionBirdsScavenging ~ OverheadCover, data = df_second_carcasses, beta_family(link = "logit"), weights = pointWeight_scaled)
new.xglmm <- expand.grid(OverheadCover = seq(min(data_both$OverheadCover), max(data_both$OverheadCover), length.out = 1000)) %>%
mutate(Area = "Hamert", pointWeight_scaled = 1)
new.yglmm <- data.frame(predict(myglmm, new.xglmm, type = "link", exclude = c("Area","pointWeight_scaled"), se.fit = TRUE)) %>%
mutate(ProportionBirdsScavenging = plogis(fit)) %>%
rename(SE.untransformed = se.fit, untransformed.predictions = fit)
addTheseglmm2 <- mutate(data.frame(new.xglmm, new.yglmm),
lwr = plogis(untransformed.predictions - SE.untransformed),
upr = plogis(untransformed.predictions + SE.untransformed))
addTheseglmm2[c(1,45,501,653,707,1000),c(1,6,5)]
# calculated by ggpredict > manually calculated
# x | Predicted | SE | > SE.untransformed
# ------------------------- > -------------------
# 0.00 | 0.96 | 1.01 | > 1.21
# 0.04 | 0.95 | 0.94 | > 1.10
# 0.46 | 0.16 | 0.86 | > 0.96
# 0.60 | 0.04 | 1.11 | > 1.32
# 0.65 | 0.02 | 1.22 | > 1.46
# 0.92 | 0.00 | 1.83 | > 2.31
# ( CarcassNumber = 2 )
CarcassNumber 的差异最为明显 2. 我的计算不正确吗?我怀疑这种差异可能与 pointWeight_scaled 在 predict 函数中的包含和排除有关。如果我不这样做,它会返回一个错误消息Error in eval(extras, data, env) : object 'pointWeight_scaled' not found。这是个常见的问题吗?我在这里读到包含然后排除它可以解决问题mgcv: How to use 'exclude' argument in predict.gam?。这不是正确的方法吗?
我希望有人能对这个问题有所了解。
【问题讨论】:
-
乍一看,
ggeffects::ggpredict似乎在使用您的完整模型,但您是在简化模型上计算 SE。 -
详细信息在
ggeffects:::get_predictions_glmmTMB中,但可能需要一些努力才能解析出正在发生的事情。 -
@ Axeman 是的,我正在简化模型上计算 SE。我不知道如何为
CarcassNumber的每个值单独生成这些模型和 SE。基本上,我试图重现sjPlot::plot_model(glmm_interaction, type = "int", ci.lvl = 0.682),但随后使用ggplot。我去看看ggeffects:::get_predictions_glmmTMB
标签: r prediction confidence-interval