【问题标题】:Plotting mean monthly temperature in ggplot with confidence intervals用置信区间在 ggplot 中绘制平均每月温度
【发布时间】:2023-07-20 08:01:02
【问题描述】:

我需要绘制平均每月温度,并在 x 轴上缩写月份,我需要添加 95% 置信区间,但不确定如何添加。 CI 的任何视觉效果都很好。

那我需要画图了

我将 Date...Time 拆分为单独的列,但我无法让 X 轴显示带有 month.abb 的缩写月份 ggplot

我得到了以下数据集(stackflow 的缩写):

# Data
CleanTempSal = data.frame(
  stringsAsFactors = F,
    Date...Time = c(
        "1/31/2017 20:00",
        "1/31/2017 21:00",
        "1/31/2017 22:00",
        "1/31/2017 23:00",
        "2/1/2017 0:00",
        "2/1/2017 1:00",
        "2/1/2017 2:00",
        "2/1/2017 3:00",
        "3/21/2017 10:00",
        "3/21/2017 11:00",
        "3/21/2017 12:00",
        "3/21/2017 13:00"),

    Temp..C. = c(14.87, 14.77, 15.08, 15.08, 
                  14.96, 14.87, 15.05, 15.05, 
                  18.87, 19.32, 19.97, 20.44),

    Salinity.psu. = c(14.58, 14.52, 14.44, 14.46, 
                      14.56, 14.67, 14.78, 14.88, 
                      18.78, 18.81, 19.41, 19.16),

    Conduc.mS.cm. = c(19.33, 19.21, 19.26, 19.28,
                      19.34, 19.44, 19.66, 19.78, 
                      26.67, 26.96, 28.14, 28.09)
    )
Date...Time   Temp..C.  Salinity.psu.   Conduc.mS.cm.
1/31/2017 20:00 14.87   14.58   19.33
1/31/2017 21:00 14.77   14.52   19.21
1/31/2017 22:00 15.08   14.44   19.26
1/31/2017 23:00 15.08   14.46   19.28
2/1/2017 0:00   14.96   14.56   19.34
2/1/2017 1:00   14.87   14.67   19.44
2/1/2017 2:00   15.05   14.78   19.66
2/1/2017 3:00   15.05   14.88   19.78
3/21/2017 10:00 18.87   18.78   26.67
3/21/2017 11:00 19.32   18.81   26.96
3/21/2017 12:00 19.97   19.41   28.14
3/21/2017 13:00 20.44   19.16   28.09

还有代码。

library(tidyverse)
library(ggplot2)
library(lubridate)

# convert date column to date class
CleanTempSal$Date...Time <- as.POSIXct(CleanTempSal$Date...Time, format = "%m/%d/%y %H:%M")

#Add Month Column to data set
CleanTempSal <- CleanTempSal %>% mutate(month = month(Date...Time))
CleanTempSal <- CleanTempSal %>% mutate(month2 = month.abb[month])
CleanTempSal <- CleanTempSal %>% mutate(year = year(Date...Time))
CleanTempSal <- CleanTempSal %>% mutate(hour = hour(Date...Time))


#group by month and take the mean of that month
a <- CleanTempSal %>%
  group_by(month) %>%
  summarise(month_mean = mean(Temp..C.))

#plot mean monthly temp
ggplot(a, aes(month, month_mean)) +
  geom_point(aes(color = month_mean)) + 
  geom_line(aes(color = month_mean)) +
  scale_color_gradient("Temp", low = "blue", high = "red4") +
  labs(x = "Month of 2017",
       y = "Water Tempearture (C)",
       title = "Monthy Mean Water Temperature",
       subtitle = "NCBS Dock - Cedar Key, FL")

给我这个

提供的数据不会产生与我为简单起见缩短的相同的图。它只会给出前 3 个月,手段会有所不同,但实现相同的目标。

【问题讨论】:

  • 谢谢@Rui Barradas
  • 我注意到一个小问题,您需要%m/%d/%Y %H:%M 转换日期/时间,大写“Y”,因为年份是 4 位数而不是 2
  • 我很抱歉,正如我所说的我在这方面很新,它像这样导入到 R... 所以小写的 y 是正确的,其他数据来自 excel。我应该更清楚。日期...时间 温度..C.盐度.psu。传导.mS.cm。 1 1/13/17 0:00 14.65 24.19 30.52 2 1/13/17 1:00 14.93 24.23 30.76 3 1/13/17 2:00 14.99 24.28 30.86 4 1/13/17 3:00 14.65 24.35 14.35 /13/17 4:00 14.68 24.35 30.72 6 1/13/17 5:00 14.65 24.35 30.70

标签: r date ggplot2 confidence-interval


【解决方案1】:

这是解决此问题的一种方法:

要获得月份的缩写,我可能会考虑将月份保留为POSIXct。通过使用floor_date,您可以获得每个时间点的月份并以所需的格式存储。绘图时,您可以使用 scale_x_datetime 并指定要在 xaxis 上使用的标签。在这种情况下,%b 将提供月份缩写。

要计算 95% 置信区间,需要考虑不同的方法。一种方法是手动计算 95% CI。请注意,此处进行了假设(基于学生 t 分布)。在这种情况下,我使用具有一定透明度(alpha .2)的geom_ribbon 来显示点之间的间隔。作为替代方案,您可以使用 stat_summary,它会计算均值和 95% CI 并显示在 ggplot 中。

#group by month and take the mean of that month
a <- CleanTempSal %>%
  group_by(month = floor_date(Date...Time, unit = "month")) %>%
  summarise(month_mean = mean(Temp..C.),
            sd = sd(Temp..C.),
            n = n()) %>%
  mutate(se = sd / sqrt(n),
         lower.ci = month_mean - qt(1 - (.05/2), n - 1) * se,
         upper.ci = month_mean + qt(1 - (.05/2), n - 1) * se)

#plot mean monthly temp
ggplot(a, aes(x = month, y = month_mean)) +
  geom_point(aes(color = month_mean)) + 
  geom_line(aes(color = month_mean)) +
  geom_ribbon(aes(ymin = lower.ci, ymax = upper.ci), alpha = 0.2) +
  scale_color_gradient("Temp", low = "blue", high = "red4") +
  scale_x_datetime(date_breaks = "1 month", date_labels = "%b") +
  labs(x = "Month of 2017",
       y = "Water Tempearture (C)",
       title = "Monthy Mean Water Temperature",
       subtitle = "NCBS Dock - Cedar Key, FL")

情节

编辑(20 年 4 月 16 日):

如果您有多年的数据,在计算 SD 和 SE 时,您应该按月和年分组:

group_by(month = floor_date(Date...Time, unit = "month"), year)

此外,我修改了ggplot 以显示错误栏而不是功能区。为了获得误差线的宽度,我们做了一些小的改动,包括使用as.Date(month)scale_x_date

#group by month and take the mean of that month
a <- CleanTempSal %>%
  group_by(month = floor_date(Date...Time, unit = "month"), year) %>%
  summarise(month_mean = mean(Temp..C.),
            sd = sd(Temp..C.),
            n = n()) %>%
  mutate(se = sd / sqrt(n),
         lower.ci = month_mean - qt(1 - (.05/2), n - 1) * se,
         upper.ci = month_mean + qt(1 - (.05/2), n - 1) * se)

#plot mean monthly temp
ggplot(a, aes(x = as.Date(month), y = month_mean)) +
  geom_point(aes(color = month_mean)) + 
  geom_line(aes(color = month_mean)) +
  #geom_ribbon(aes(ymin = lower.ci, ymax = upper.ci), alpha = 0.2) +
  geom_errorbar(aes(ymin = month_mean - se, ymax = month_mean + se), width = 1) +
  scale_color_gradient("Temp", low = "blue", high = "red4") +
  scale_x_date(date_breaks = "1 month", date_labels = "%b %y") +
  labs(x = "Month",
       y = "Water Tempearture (C)",
       title = "Monthy Mean Water Temperature",
       subtitle = "NCBS Dock - Cedar Key, FL")

情节

数据

CleanTempSal <- structure(list(Date...Time = structure(c(1485914400, 1485918000, 
1485921600, 1485925200, 1485928800, 1485932400, 1485936000, 1485939600, 
1490108400, 1490112000, 1490115600, 1490119200), class = c("POSIXct", 
"POSIXt"), tzone = ""), Temp..C. = c(14.87, 14.77, 15.08, 15.08, 
14.96, 14.87, 15.05, 15.05, 18.87, 19.32, 19.97, 20.44), Salinity.psu. = c(14.58, 
14.52, 14.44, 14.46, 14.56, 14.67, 14.78, 14.88, 18.78, 18.81, 
19.41, 19.16), Conduc.mS.cm. = c(19.33, 19.21, 19.26, 19.28, 
19.34, 19.44, 19.66, 19.78, 26.67, 26.96, 28.14, 28.09), month = c(1, 
1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3), month2 = c("Jan", "Jan", "Jan", 
"Jan", "Feb", "Feb", "Feb", "Feb", "Mar", "Mar", "Mar", "Mar"
), year = c(2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 
2017, 2017, 2017), hour = c(20L, 21L, 22L, 23L, 0L, 1L, 2L, 3L, 
10L, 11L, 12L, 13L)), class = "data.frame", row.names = c(NA, 
-12L))

【讨论】:

  • 非常感谢@Ben!这个社区的人真的很棒。让它们作为每个点的置信区间条而不是丝带弹出有多难?
  • 我会使用标准误差作为条形...您可以将geom_ribbon 替换为:geom_errorbar(aes(ymin = month_mean - se, ymax = month_mean + se))
  • Ben 我使用了那条线,并被告知我计算了整个数据集的置信区间而不是按月计算。有什么建议么?此外,当我尝试使用条而不是功能区时,它显示为一条线,我无法显示错误条的顶帽和底帽。
  • @Johnny5ish SD 和 SE 是按月计算的,但如果您有多年的数据,它会将给定月份的年份组合在一起。会是这样吗?如果是这样,那么您可以group_by 月份和年份。
  • @Johnny5ish 请参阅上面的编辑答案。这应该是 group_by 月份和年份。也有错误栏的例子。希望这会有所帮助 - 如果仍有问题,请告诉我。