geom_smooth，stat_smooth 置信区间不起作用？答案

【问题标题】：geom_smooth, stat_smooth confidence interval not working?geom_smooth，stat_smooth 置信区间不起作用？
【发布时间】：2014-08-13 15:47:06
【问题描述】：

我有这些数据：

structure(list(Run = c("A013", "A015", "A023", "A024", "A031", 
"A032", "A035", "A040", "A045", "A046", "A049", "A013", "A015", 
"A023", "A024", "A031", "A032", "A035", "A040", "A045", "A046", 
"A013", "A015", "A023", "A024", "A031", "A032", "A035", "A040", 
"A013", "A015", "A023", "A024", "A031", "A032", "A035", "A040", 
"A013", "A015", "A023", "A024", "A031", "A032", "A013", "A015", 
"A023", "A024", "A013", "A015", "A023", "A024"), Step = structure(c(1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 
4L, 4L, 4L, 4L, 5L, 5L, 5L, 5L, 5L, 5L, 6L, 6L, 6L, 6L, 7L, 7L, 
7L, 7L), .Label = c("1", "e", "k", "2", "q", "b", "m"), class = "factor"), 
    Weight = c(87.4064, 79.5822, 117.0674, 102.6384, 134.0752, 
    111.2398, 107.8464, 111.2576, 104.2428, 110.2848, 28.7292, 
    41.65656, 73.9356, 84.18504, 89.4845, 71.55106, 86.04072, 
    76.27296, 92.8749, 85.203, 91.92112, 39.5009258, 58.6035081, 
    75.13589946, 83.43157667, 88.8993795, 68.85183559, 64.77081269, 
    77.56733054, 32.5025, 51.45329, 66.29101, 73.79125, 79.95483, 
    60.9573, 58.34856, 68.83193, 29.65289, 40.74267, 56.97243, 
    61.48708, 70.24226, 54.79253, 22.8231064, 38.9966088, 55.2736576, 
    62.6077916, 20.7458048, 38.306526, 54.7937568, 61.1417148
    )), .Names = c("Run", "Step", "Weight"), row.names = c(NA, 
-51L), class = "data.frame")

我正在尝试使用 0.99 置信度获得漂亮的 geom_smooth()

require(ggplot2)
require(directlabels)
g1 <- ggplot(m1,
             aes(x=Step,y=Weight,label=Run,group=Run,color=Run)) + 
  geom_point() + geom_line()
g2 <- g1 +  geom_dl(method="first.bumpup")
g2 + geom_smooth(aes(group=1),level=0.99)

这是我的问题：

错误带看起来不像 99% 的置信度，图表中的很多点都在它之外。
当我展开数据集时，错误功能区会收缩到非常窄，大部分点都在它之外。

我在这里做错了吗？谢谢，

编辑：这是我在运行时看到的：当我查看更大的数据集时，功能区变得更窄，几乎位于平滑线的顶部。

【问题讨论】：

这是 mean 的置信区间，你在说。我认为你所追求的是预测间隔。无耻的自我推销：rpubs.com/RomanL/7024
你是对的。在这种情况下，我很困惑置信区间的含义。我会探索你的选择。
很遗憾，您没有在此处添加示例，我将其标记为正确。此外，最好推导出一个可以应用于特定点的方程，例如步骤 e 的 A046，它可以预测最后一步的权重，步骤 m。

标签： r ggplot2

【解决方案1】：

置信区间和预测区间是两种不同的野兽。前者是关于您的数据的平均值（拟合值），而后者是未来观察的所在。

这是我来自RPubs repository的代码。

set.seed(357)
library(ggplot2) # for ggplot()
library(gridExtra) 

x <- rnorm(20)
y <- x * rnorm(20, mean = 3, sd = 1)
xy <- data.frame(x, y)

mdl <- lm(y ~ x, data = xy)

# Predict these data for...
predx <- data.frame(x = seq(from = -2, to = 3, by = 0.1))

# ... confidence interval
conf.int <- cbind(predx, predict(mdl, newdata = predx, interval = "confidence", level = 0.95))

# ... prediction interval
pred.int <- cbind(predx, predict(mdl, newdata = predx, interval = "prediction", level = 0.95))
man <- predict(mdl, newdata = predx, se = TRUE)

# Manual calculation of confidence interval, tolerance of 0.95 (1.96).
lvl <- qt(1-(1 - 0.95)/2, mdl$df.residual) # Thank you, @Roland (http://chat.stackoverflow.com/transcript/message/10581408#10581408)
conf.int.man <- cbind(predx, fit = man$fit, lwr = man$fit - lvl * man$se.fit, upr = man$fit + lvl * man$se.fit)

g.conf <- ggplot(conf.int, aes(x = x, y = fit)) +
  theme_bw() +
  ggtitle("Confidence interval of estimated parameters from predict()") +
  geom_point(data = xy, aes(x = x, y = y)) +
  geom_smooth(data = conf.int, aes(ymin = lwr, ymax = upr), stat = "identity") 

g.pred <- ggplot(pred.int, aes(x = x, y = fit)) +
  theme_bw() +
  ggtitle("Prediction interval for future observations from predict()") +
  geom_point(data = xy, aes(x = x, y = y)) +
  geom_smooth(data = pred.int, aes(ymin = lwr, ymax = upr), stat = "identity")

grid.arrange(g.conf, g.pred, ncol = 2)

【讨论】：