【问题标题】:Why doesn't ggplot show the error bar of a boxplot?为什么 ggplot 不显示箱线图的误差线?
【发布时间】:2020-12-30 17:56:50
【问题描述】:

我在 R 中用 ggplot 做一些箱线图,我想知道为什么它只显示一个箱线图的错误栏? 代码就是这个:

ID1.4.5.6.7[,"Time"] <- as.factor(ID1.4.5.6.7[,"Time"])
ggplot(data=ID1.4.5.6.7,aes(x=Time, y=mRNA, fill=Time)) +
  geom_boxplot(notch = TRUE) +
  
  stat_boxplot(geom="errorbar")+
  
  labs(title="mRNA vs Time", subtitle="Irradiated",x = "Time [min]",y = "mRNA")+
  theme(plot.title = element_text(hjust = 0.5),plot.subtitle = element_text(hjust = 0.5))

不知道是不是代码的问题,还是不是数据的问题

structure(list(Gene = c("ID-1", "ID-1", "ID-1", "ID-1", "ID-1", 
"ID-1", "ID-1", "ID-1", "ID-1", "ID-1", "ID-1", "ID-1", "ID-1", 
"ID-1", "ID-1", "ID-4", "ID-4", "ID-4", "ID-4", "ID-4", "ID-4", 
"ID-4", "ID-4", "ID-4", "ID-4", "ID-4", "ID-4", "ID-4", "ID-4", 
"ID-4", "ID-4", "ID-5", "ID-5", "ID-5", "ID-5", "ID-5", "ID-5", 
"ID-5", "ID-5", "ID-5", "ID-5", "ID-5", "ID-5", "ID-5", "ID-5", 
"ID-5", "ID-5", "ID-5", "ID-5", "ID-6", "ID-6", "ID-6", "ID-6", 
"ID-6", "ID-6", "ID-6", "ID-6", "ID-6", "ID-6", "ID-6", "ID-6", 
"ID-6", "ID-6", "ID-6", "ID-6", "ID-6", "ID-6", "ID-7", "ID-7", 
"ID-7", "ID-7", "ID-7", "ID-7", "ID-7", "ID-7", "ID-7", "ID-7", 
"ID-7", "ID-7", "ID-7", "ID-7", "ID-7", "ID-7", "ID-7", "ID-7"
), mRNA = c(-0.181385669, -0.059647494, 0.104476117, -0.052190978, 
-0.040484945, 0.194226742, -0.501601326, 0.102342605, -0.127143845, 
-0.008523742, -0.102946211, -0.042894028, 0.002922923, -0.134394347, 
-0.214204393, -0.138122686, 0.203242361, 0.097935502, 0.147068146, 
-0.089430917, 0.331565412, -0.034572422, -0.129896329, 0.324191, 
0.470108479, -0.027268223, 0.232304713, 0.090348708, 0.070848402, 
0.181540708, -0.502255367, -0.267631441, -0.368647839, -0.040910404, 
-0.003983171, -0.003983171, -0.003983171, -0.14980589, -0.119449612, 
-0.309154214, -0.487589361, 0.272803506, -0.421733575, -0.467108567, 
0.024868338, -0.156025729, -0.044680175, -0.206716896, -0.272014193, 
-0.230499883, -0.238597397, -0.118130949, 0.349957464, 0.349957464, 
0.349957464, 0.172048587, -0.186226994, 0.16113822, -0.293029136, 
-0.111636253, -0.044189887, 0.081555274, -0.048106079, -0.05853566, 
0.010407814, -0.066981809, -0.09828484, -0.315190986, -0.005102456, 
0.221556197, 0.206584568, 0.206584568, 0.206584568, 0.102649006, 
-0.011777384, -0.36963487, -0.054853074, -0.230240699, -0.210508323, 
-0.208889919, -0.050763372, 0.023073782, -0.095118984, -0.091076071, 
-0.330257395), Time = structure(c(2L, 2L, 2L, 3L, 3L, 2L, 3L, 
3L, 4L, 4L, 4L, 4L, 5L, 5L, 5L, 2L, 2L, 2L, 3L, 3L, 2L, 3L, 3L, 
4L, 4L, 4L, 4L, 5L, 5L, 5L, 5L, 2L, 2L, 2L, 1L, 1L, 1L, 3L, 3L, 
2L, 3L, 3L, 4L, 4L, 4L, 5L, 5L, 5L, 5L, 2L, 2L, 2L, 1L, 1L, 1L, 
3L, 3L, 2L, 3L, 3L, 4L, 4L, 4L, 4L, 5L, 5L, 5L, 2L, 2L, 2L, 1L, 
1L, 1L, 3L, 2L, 3L, 3L, 4L, 4L, 4L, 4L, 5L, 5L, 5L, 5L), .Label = c("0", 
"20", "40", "60", "120"), class = "factor"), predicted_mRNA = c(-0.00551000342030954, 
-0.00551000342030954, -0.00551000342030954, -0.0302695238715682, 
-0.0302695238715682, -0.00551000342030954, -0.0302695238715682, 
-0.0302695238715682, -0.0550290443228268, -0.0550290443228268, 
-0.0550290443228268, -0.0550290443228268, -0.129307605676603, 
-0.129307605676603, -0.129307605676603, -0.00551000342030954, 
-0.00551000342030954, -0.00551000342030954, -0.0302695238715682, 
-0.0302695238715682, -0.00551000342030954, -0.0302695238715682, 
-0.0302695238715682, -0.0550290443228268, -0.0550290443228268, 
-0.0550290443228268, -0.0550290443228268, -0.129307605676603, 
-0.129307605676603, -0.129307605676603, -0.129307605676603, -0.00551000342030954, 
-0.00551000342030954, -0.00551000342030954, 0.0192495170309491, 
0.0192495170309491, 0.0192495170309491, -0.0302695238715682, 
-0.0302695238715682, -0.00551000342030954, -0.0302695238715682, 
-0.0302695238715682, -0.0550290443228268, -0.0550290443228268, 
-0.0550290443228268, -0.129307605676603, -0.129307605676603, 
-0.129307605676603, -0.129307605676603, -0.00551000342030954, 
-0.00551000342030954, -0.00551000342030954, 0.0192495170309491, 
0.0192495170309491, 0.0192495170309491, -0.0302695238715682, 
-0.0302695238715682, -0.00551000342030954, -0.0302695238715682, 
-0.0302695238715682, -0.0550290443228268, -0.0550290443228268, 
-0.0550290443228268, -0.0550290443228268, -0.129307605676603, 
-0.129307605676603, -0.129307605676603, -0.00551000342030954, 
-0.00551000342030954, -0.00551000342030954, 0.0192495170309491, 
0.0192495170309491, 0.0192495170309491, -0.0302695238715682, 
-0.00551000342030954, -0.0302695238715682, -0.0302695238715682, 
-0.0550290443228268, -0.0550290443228268, -0.0550290443228268, 
-0.0550290443228268, -0.129307605676603, -0.129307605676603, 
-0.129307605676603, -0.129307605676603)), row.names = c(NA, -85L
), class = "data.frame")

这是 dput(ID1.4.5.6.7) 和数据框。

【问题讨论】:

  • 能否请您dput(ID1.4.5.6.7) 并添加到问题中以帮助您??
  • 错误栏在那里,您可以从穿过框的垂直线看到。因此,我猜误差条末端的水平线与框的轮廓重叠。
  • @stefan 水平线的值为 Q1+-1.5xIQR ?所以这个值应该等于异常值?你说的“大纲”是什么意思?
  • ...盒子的上下边界。此外。我检查了你之前帖子中数据的代码,一切正常。
  • @Duck 我添加了你问的内容

标签: r ggplot2 statistics boxplot errorbar


【解决方案1】:

我建议您使用这种方法,您可以启用varwidth 以查看错误栏。代码如下:

#Plot
ggplot(data=ID1.4.5.6.7,aes(x=Time, y=mRNA, fill=Time)) +
  geom_boxplot(varwidth = TRUE,notch=TRUE) +
  stat_boxplot(geom="errorbar")+
  labs(title="mRNA vs Time", subtitle="Irradiated",x = "Time [min]",y = "mRNA")+
  theme(plot.title = element_text(hjust = 0.5),plot.subtitle = element_text(hjust = 0.5))

输出:

【讨论】:

    【解决方案2】:

    因为箱线图没有误差线。箱线图只是五个数字的图形表示:最小值、Q1(第一个四分位数)、中位数、Q3(第三个四分位数)和最大值。胡须(上下移动的“条”)只是以数据中的最小值(下部)和最大值(上部)结尾的线。 “盒子”的底边是 Q1,顶边是 Q3。

    可以安排一组数据,使得最小值与 Q1 相同,最大值与 Q3 相同。或多或少,这似乎是没有胡须的箱线图中正在发生的事情。 ggplot 在箱线图中添加了一些额外的细节(被拉入的“腰部”,以及导致您在 Time 0 组顶部看到的反转可能性的算法调整),但或多或​​少似乎成为正在发生的事情。

    编辑:这似乎是关于代码的问题,但实际上是关于统计的。交叉验证可能会更好(尽管我认为现在它可能已经得到了充分的回答)。

    【讨论】:

    • 我认为ggplot2中箱线图的实现与您描述的不完全相同:"The upper whisker extends from the hinge to the largest value no further than 1.5 * IQR from the hinge"
    • 在箱形图和晶须图中绘制晶须有两种常用方法 (support.sas.com/documentation/onlinedoc/stat/131/boxplot.pdf)。 SKELETAL 晶须延伸到最大值/最小值。 SCHEMATIC 晶须通常将 IQR 扩展 1.5 倍(但您可以使用 geom_boxplot() coef 参数对其进行调整。Tukey 选择 1.5 倍是因为如果数据大致呈正态分布,则应该覆盖大约 95% 的数据。
    • @starja and itsMeinMiami:是的,在我的回答中,我多次提到 1.5 IQR 规则,但 OP 认为箱线图有误差线,所以我不想添加太多细节,只是为了强调他们的箱线图没有被破坏;它可能准确地显示了数据。误差线的期望是问题所在。一旦 OP 理解胡须不是误差线,我希望这个问题的大部分都得到解决。这个问题是基于对统计/图形概念的误解,而不是代码问题。
    猜你喜欢
    • 1970-01-01
    • 2012-08-31
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2012-11-02
    • 1970-01-01
    • 2021-11-10
    • 1970-01-01
    相关资源
    最近更新 更多