`level` 如何在 geom_smooth 中生成置信区间？答案

【问题标题】：How is `level` used to generate the confidence interval in geom_smooth?`level` 如何在 geom_smooth 中生成置信区间？
【发布时间】：2018-01-26 08:00:42
【问题描述】：

我无法模拟 stat_smooth 如何计算其置信区间。

让我们生成一些数据和一个简单的模型：

library(tidyverse)    
# sample data
df = tibble(
  x = runif(10),
  y = x + rnorm(10)*0.2
)

# simple linear model
model = lm(y ~ x, df)

现在使用predict() 生成值和置信区间

# predict 
df$predicted = predict(
  object = model,
  newdata = df
)

# predict 95% confidence interval
df$CI = predict(
  object = model,
  newdata = df,
  se.fit = TRUE
)$se.fit * qnorm(1 - (1-0.95)/2)

注意 qnorm 用于从标准误差扩展到 95% CI

绘制数据（黑点）、geom_smooth（黑线 + 灰丝带）和预测丝带（红线和蓝线）。

ggplot(df) +
  aes(x = x, y = y) +
  geom_point(size = 2) +
  geom_smooth(method = "lm", level = 0.95, fullrange = TRUE, color = "black") +
  geom_line(aes(y = predicted + CI), color = "blue") + # upper
  geom_line(aes(y = predicted - CI), color = "red") + # lower
  theme_classic()

红线和蓝线应与丝带的边缘相同。我做错了什么？

【问题讨论】：

它可能使用基于 t 分布而不是正态分布的置信区间...
谢谢！我用qt(1 - (1-0.95)/2, nrow(df)) 替换了qnorm(...)，它完美地对齐了。如果您作为答案发布，我会将其标记为正确。

标签： r ggplot2

【解决方案1】：

正如@Dason 在评论中所发布的，答案是 geom_smooth 使用 t 分布，而不是正态分布。

在我原来的问题中，将 qnorm(1 - (1-0.95)/2) 替换为 qt(1 - (1-0.95)/2, nrow(df)) 以匹配行。

【讨论】：