【问题标题】:Count factor levels by month per year每年按月计算因子水平
【发布时间】:2019-01-22 07:41:41
【问题描述】:

我在两列中有数据:日期和因子变量。我的数据片段:

           Date Category
1    2009-06-22    BREAD
2    2009-06-23    BREAD
3    2009-06-23    BREAD
4    2009-06-23      JAM
5    2009-06-23     MILK
6    2009-06-24    BREAD
9    2009-06-24     MILK
10   2009-06-25      JAM

问题:我需要计算每个Category 类型在每年每个月出现的次数。

I tried approaches like this,使用aggregate,但我不知道如何在那里拟合因子变量。

数据样本:这是一个可行的数据样本(有更多月份和年份):http://rextester.com/DYMXN47464 当然,我的最终(真实)数据是从 2009 年到 2018 年,每个月的每个月年,但这些观察结果太多,我无法分享全部数据。

【问题讨论】:

    标签: r date dataframe aggregate


    【解决方案1】:

    我们还可以将Date 类转换为yearmon(来自zoo)并得到count

    library(zoo)
    library(dplyr)
    data %>% 
         count(yearmon = as.yearmon(Date), Category)
    # A tibble: 11 x 3
    #   yearmon       Category     n
    #   <S3: yearmon> <fct>    <int>
    # 1 Jun 2009      MILK         2
    # 2 Jun 2009      BREAD        6
    # 3 Jun 2009      JAM          2
    # 4 Apr 2010      MILK         2
    # 5 Apr 2010      BREAD        7
    # 6 Apr 2010      JAM          2
    # 7 Dec 2011      MILK         4
    # 8 Dec 2011      BREAD       13
    # 9 Dec 2011      JAM          1
    #10 Jan 2012      MILK         1
    #11 Jan 2012      BREAD        2
    

    注意:数据取自 @phiver 的帖子

    【讨论】:

      【解决方案2】:

      基于您的数据集。将年份和月份添加到数据中,按年份、月份和类别分组并统计结果。

      library(dplyr)
      library(lubridate)
      
      data %>% mutate(year = year(Date),
                      month = month(Date)) %>% 
        group_by(year, month, Category) %>%
        summarise(count = n())
      
      # A tibble: 11 x 4
      # Groups:   year, month [?]
          year month Category count
         <dbl> <dbl> <fct>    <int>
       1  2009     6 MILK         2
       2  2009     6 BREAD        6
       3  2009     6 JAM          2
       4  2010     4 MILK         2
       5  2010     4 BREAD        7
       6  2010     4 JAM          2
       7  2011    12 MILK         4
       8  2011    12 BREAD       13
       9  2011    12 JAM          1
      10  2012     1 MILK         1
      11  2012     1 BREAD        2
      

      数据:

      data <- structure(list(Date = structure(c(14417, 14418, 14418, 14418, 
      14418, 14419, 14419, 14419, 14419, 14420, 14725, 14725, 14726, 
      14726, 14726, 14726, 14727, 14727, 14727, 14727, 14728, 15335, 
      15335, 15335, 15335, 15336, 15336, 15336, 15336, 15337, 15337, 
      15337, 15337, 15338, 15338, 15338, 15338, 15339, 15339, 15342, 
      15342, 15342), class = "Date"), Category = structure(c(2L, 2L, 
      2L, 3L, 1L, 2L, 2L, 2L, 1L, 3L, 2L, 2L, 2L, 1L, 2L, 2L, 3L, 3L, 
      1L, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 1L, 1L, 1L, 3L, 2L, 2L, 2L, 2L, 
      2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L), .Label = c("MILK", "BREAD", 
      "JAM", "SALTO DE BANCA"), class = "factor")), row.names = c(1L, 
      2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 1000L, 1001L, 1002L, 1003L, 
      1004L, 1005L, 1006L, 1007L, 1008L, 1009L, 1010L, 3000L, 3001L, 
      3002L, 3003L, 3004L, 3005L, 3006L, 3007L, 3008L, 3009L, 3010L, 
      3011L, 3012L, 3013L, 3014L, 3015L, 3016L, 3017L, 3018L, 3019L, 
      3020L), class = "data.frame")
      

      【讨论】:

        猜你喜欢
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 2021-12-15
        • 1970-01-01
        • 2017-11-29
        • 1970-01-01
        相关资源
        最近更新 更多