【问题标题】:Pulling lagged data from a particular season but only for specific data sets as indicated by variable in R从特定季节提取滞后数据,但仅针对特定数据集,如 R 中的变量所示
【发布时间】:2022-01-17 07:20:54
【问题描述】:

我最初的询问来自这个问题:Pulling lagged data but only for a particular season in R

这回答了我对特定数据框的问题;但是,现在我有一个大型聚合数据框,需要添加一行代码来说明每个单独的数据集(Lake_name)。

这是我的数据:

   SeasonYear       change   Lake_name
1  winter2020  0.007877245   AlanHenry
2  spring2020  0.058515310   AlanHenry
3  summer2020  0.013850687   AlanHenry
4    fall2020 -0.071774781   AlanHenry
5  winter2021 -0.040268206   AlanHenry
6  spring2021 -0.020803715   AlanHenry
7  summer2021  0.181610974   AlanHenry
8  winter2020 -0.029708916     Amistad
9  spring2020 -0.063310371     Amistad
10 summer2020 -0.054231575     Amistad
11   fall2020  0.016057252     Amistad
12 winter2021  0.011785717     Amistad
13 spring2021 -0.030677687     Amistad
14 summer2021 -0.015691720     Amistad
15 winter2020 -0.011974634 AmonGCarter
16 spring2020  0.168774234 AmonGCarter
17 summer2020 -0.041486735 AmonGCarter
18   fall2020 -0.095134974 AmonGCarter
19 winter2021 -0.030310177 AmonGCarter
20 spring2021  0.033528325 AmonGCarter

我正在尝试构建一个函数,该函数可以消除上一个春天的滞后(参见上一篇文章),但也可以考虑每个湖泊。如果我将它单独分开,我可以做到这一点,但我有一个相当大的数据集,这需要很长时间才能做到。这是我尝试使用的代码(根据我引用的帖子修改):

library(dplyr)
lag_spring <- function(x, y, n = 1) {
  data.frame(x = x, season_year = y) %>% 
    group_by(Lake_name) %>%
    tidyr::extract(season_year, into = c("season", "year"), regex = "^(.+?)(\\d{4})$") %>%
    group_by(year) %>%
    mutate(springmean = x[season == "spring"]) %>%
    ungroup() %>%
    group_by(season) %>%
    mutate(lag = ifelse(!season %in% c("summer", "fall"), lag(springmean, n = n), lag(springmean, n = n - 1))) %>%
    ungroup() %>%
    pull(lag)
}

我尝试在每个湖中添加 group_by(Lake_name) 来执行此操作,但是当我运行代码时:

data %>%  mutate(springlag = lag_spring(change, SeasonYear,n=1),
         springlag2 = lag_spring(change, SeasonYear,n=2),
         springlag3 = lag_spring(change, SeasonYear,n=3))

我收到此错误:

错误:mutate() 输入弹簧滞后问题。 x 必须按 .data 中的变量分组。 未找到列 Lake_name。 i 输入 springlag 为 lag_spring(change, SeasonYear, n = 1)

有人可以帮助修改我之前获得的代码以获得“springlag”,但在 dplyr 中包含仅在每个单独的湖中执行此操作的行吗?

【问题讨论】:

    标签: r dplyr lag


    【解决方案1】:

    无需更改功能。您可以在计算滞后的mutate 之前使用group_by 来达到您想要的结果:

    library(tidyr)
    library(dplyr)
    
    lag_spring <- function(x, y, n = 1) {
      data.frame(x = x, season_year = y) %>%
        tidyr::extract(season_year, into = c("season", "year"), regex = "^(.+?)(\\d{4})$") %>%
        group_by(year) %>%
        mutate(springmean = if (any(season == "spring")) x[season == "spring"] else NA) %>%
        ungroup() %>%
        group_by(season) %>%
        mutate(lag = ifelse(!season %in% c("summer", "fall"), lag(springmean, n = n), lag(springmean, n = n - 1))) %>%
        ungroup() %>%
        pull(lag)
    }
    
    dd %>%
      group_by(Lake_name) %>%
      mutate(lag = lag_spring(change, SeasonYear))
    #> # A tibble: 20 × 4
    #> # Groups:   Lake_name [3]
    #>    SeasonYear   change Lake_name       lag
    #>    <chr>         <dbl> <chr>         <dbl>
    #>  1 winter2020  0.00788 AlanHenry   NA     
    #>  2 spring2020  0.0585  AlanHenry   NA     
    #>  3 summer2020  0.0139  AlanHenry    0.0585
    #>  4 fall2020   -0.0718  AlanHenry    0.0585
    #>  5 winter2021 -0.0403  AlanHenry    0.0585
    #>  6 spring2021 -0.0208  AlanHenry    0.0585
    #>  7 summer2021  0.182   AlanHenry   -0.0208
    #>  8 winter2020 -0.0297  Amistad     NA     
    #>  9 spring2020 -0.0633  Amistad     NA     
    #> 10 summer2020 -0.0542  Amistad     -0.0633
    #> 11 fall2020    0.0161  Amistad     -0.0633
    #> 12 winter2021  0.0118  Amistad     -0.0633
    #> 13 spring2021 -0.0307  Amistad     -0.0633
    #> 14 summer2021 -0.0157  Amistad     -0.0307
    #> 15 winter2020 -0.0120  AmonGCarter NA     
    #> 16 spring2020  0.169   AmonGCarter NA     
    #> 17 summer2020 -0.0415  AmonGCarter  0.169 
    #> 18 fall2020   -0.0951  AmonGCarter  0.169 
    #> 19 winter2021 -0.0303  AmonGCarter  0.169 
    #> 20 spring2021  0.0335  AmonGCarter  0.169
    

    数据

    dd <- structure(list(SeasonYear = c(
      "winter2020", "spring2020", "summer2020",
      "fall2020", "winter2021", "spring2021", "summer2021", "winter2020",
      "spring2020", "summer2020", "fall2020", "winter2021", "spring2021",
      "summer2021", "winter2020", "spring2020", "summer2020", "fall2020",
      "winter2021", "spring2021"
    ), change = c(
      0.007877245, 0.05851531,
      0.013850687, -0.071774781, -0.040268206, -0.020803715, 0.181610974,
      -0.029708916, -0.063310371, -0.054231575, 0.016057252, 0.011785717,
      -0.030677687, -0.01569172, -0.011974634, 0.168774234, -0.041486735,
      -0.095134974, -0.030310177, 0.033528325
    ), Lake_name = c(
      "AlanHenry",
      "AlanHenry", "AlanHenry", "AlanHenry", "AlanHenry", "AlanHenry",
      "AlanHenry", "Amistad", "Amistad", "Amistad", "Amistad", "Amistad",
      "Amistad", "Amistad", "AmonGCarter", "AmonGCarter", "AmonGCarter",
      "AmonGCarter", "AmonGCarter", "AmonGCarter"
    )), class = "data.frame", row.names = c(
      "1",
      "2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13",
      "14", "15", "16", "17", "18", "19", "20"
    ))
    

    【讨论】:

    • 这适用于 dd 数据集,但是当我尝试将其应用于我的数据时,我得到一个错误。错误:mutate() 输入 lag 有问题。 x mutate() 输入 springmean 有问题。 x 输入 springmean 无法回收到尺寸 2。 i 输入 springmeanx[season == "spring"]。 i 输入springmean 的大小必须为 2 或 1,而不是 0。 i 组 1 中发生错误:年 =“2009”。 i 输入laglag_spring(change, SeasonYear)。 i 组 1 中发生的错误:year = "2009"。
    • 我尝试像您的 dd &lt;- structure(list(SeasonYear = c(raw.WL.season$SeasonYear), change = c(raw.WL.season$change), Lake_name = c(raw.WL.season$Lake_name)), class = "data.frame", row.names = c(1:nrow(raw.WL.season))) 一样构建我的数据框,但我仍然收到该错误。我想知道您是否可以帮助我弄清楚如何避免该错误。
    • 嗨@DavidSmith。我刚刚进行了编辑并稍微更改了功能。我的功能的一个问题是它只有在存在滞后的“弹簧”时才有效。如果不是这种情况,x[season == "spring"] 将不起作用并导致您收到错误。不确定这是否确实是问题,但您可以尝试一下。
    • 现在完美运行!谢谢!
    猜你喜欢
    • 2021-12-22
    • 2021-08-15
    • 2014-12-10
    • 2017-11-17
    • 1970-01-01
    • 1970-01-01
    • 2020-08-10
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多