【问题标题】:R: How can I calculate averages for each nth interval in a data frame?R:如何计算数据框中每个第 n 个间隔的平均值?
【发布时间】:2021-03-16 05:00:58
【问题描述】:

我正在尝试使用 tidyverse 函数(即 dplyr 和/或 tidyr)按组查找每 5 年间隔的列的平均值。

例如,如果我使用 R 中现有的 gapminder 数据,我将如何计算每个大陆每 5 年间隔的平均预期寿命?

我可以尝试这样的事情,但它并不能完全满足我的需求,因为我不确定如何在代码中包含 5 年间隔:

library(gapminder)
gapminder <- gapminder

gapminder.avglife <- gapminder %>% group_by(continent) %>% 
  summarize(lifeavg = mean(lifeExp))

【问题讨论】:

    标签: r dplyr tidyverse tidyr


    【解决方案1】:

    每隔 5 年在 group_by 中创建另一列,并计算 meanlifeExp

    library(gapminder)
    library(dplyr)
    
    gapminder %>% 
      group_by(continent, year = ceiling(year/5) * 5) %>% 
      summarize(year = paste(first(year) - 5, first(year), sep = '-'),
                lifeavg = mean(lifeExp)) %>%
      ungroup
    
    #  continent year      lifeavg
    #   <fct>     <chr>       <dbl>
    # 1 Africa    1950-1955    39.1
    # 2 Africa    1955-1960    41.3
    # 3 Africa    1960-1965    43.3
    # 4 Africa    1965-1970    45.3
    # 5 Africa    1970-1975    47.5
    # 6 Africa    1975-1980    49.6
    # 7 Africa    1980-1985    51.6
    # 8 Africa    1985-1990    53.3
    # 9 Africa    1990-1995    53.6
    #10 Africa    1995-2000    53.6
    # … with 50 more rows
    

    【讨论】:

      【解决方案2】:

      我的答案是这样的

      gapminder %>% group_by(continent) %>% 
        mutate(FiveYrInterval = ((year - min(year)) %/% 5)+1) %>%
        group_by(continent, FiveYrInterval) %>%
        summarise(mean(lifeExp))
      
      # A tibble: 60 x 3
      # Groups:   continent [5]
         continent FiveYrInterval `mean(lifeExp)`
         <fct>              <dbl>           <dbl>
       1 Africa                 1            39.1
       2 Africa                 2            41.3
       3 Africa                 3            43.3
       4 Africa                 4            45.3
       5 Africa                 5            47.5
       6 Africa                 6            49.6
       7 Africa                 7            51.6
       8 Africa                 8            53.3
       9 Africa                 9            53.6
      10 Africa                10            53.6
      # ... with 50 more rows
      

      Ronak 的 answer 确实要好得多。

      【讨论】:

        【解决方案3】:

        您可以尝试使用 ggplot2 中的 cut_interval 来获取每个大陆的 5 年间隔

        gapminder %>% 
          mutate(interval = cut_interval(year, 
                                         n = (max(year)-min(year))/5)) %>% 
          group_by(continent, interval) %>% 
          summarise(avg = mean(lifeExp)) 
        
        # A tibble: 55 x 3
        # Groups:   continent [5]
           continent interval      avg
           <fct>     <fct>       <dbl>
         1 Africa    [1952,1957]  40.2
         2 Africa    (1957,1962]  43.3
         3 Africa    (1962,1967]  45.3
         4 Africa    (1967,1972]  47.5
         5 Africa    (1972,1977]  49.6
         6 Africa    (1977,1982]  51.6
         7 Africa    (1982,1987]  53.3
         8 Africa    (1987,1992]  53.6
         9 Africa    (1992,1997]  53.6
        10 Africa    (1997,2002]  53.3
        # ... with 45 more rows
        

        【讨论】:

          【解决方案4】:

          尝试使用 Hmisc 包中的 cut2

          library(Hmisc)
          
          gapminder %>% 
            mutate(interval = cut2(year, seq(1952,2007,5))) %>% 
            group_by(continent, interval) %>% 
            summarise(avg = mean(lifeExp))
          
          # A tibble: 55 x 3
          # Groups:   continent [5]
             continent interval   avg
             <fct>     <fct>    <dbl>
           1 Africa    1952      39.1
           2 Africa    1957      41.3
           3 Africa    1962      43.3
           4 Africa    1967      45.3
           5 Africa    1972      47.5
           6 Africa    1977      49.6
           7 Africa    1982      51.6
           8 Africa    1987      53.3
           9 Africa    1992      53.6
          10 Africa    1997      53.6
          # ... with 45 more rows
          

          【讨论】:

            猜你喜欢
            • 2018-01-03
            • 2018-05-05
            • 2022-01-11
            • 2021-12-12
            • 2020-05-22
            • 2017-09-23
            • 1970-01-01
            • 2020-12-16
            相关资源
            最近更新 更多