【问题标题】:R dplyr | Categorize days of a month into four factors: Mon, Weekdays, Fri, WeekendR dplyr |将一个月的天数分为四个因素:周一、工作日、周五、周末
【发布时间】:2021-11-29 06:10:19
【问题描述】:

我正在处理时间戳数据框。数据框一月份样本中与日期相关的变量的摘录:

sample_dates <- data.frame(date = c("2021-01-01", "2021-01-02", "2021-01-03", "2021-01-04", "2021-01-05", "2021-01-06", "2021-01-07", "2021-01-08", "2021-01-09", "2021-01-10", "2021-01-11", "2021-01-12", "2021-01-13", "2021-01-14", "2021-01-15", "2021-01-16", "2021-01-17", "2021-01-18", "2021-01-19", "2021-01-20", "2021-01-21", "2021-01-22", "2021-01-23", "2021-01-24", "2021-01-25", "2021-01-26", "2021-01-27", "2021-01-28", "2021-01-29", "2021-01-30", "2021-01-31"))

sample_dates <- sample_dates %>% 
    mutate(date = as.POSIXct(date)) %>% 
    mutate(day = factor(format(date, "%a")))

我想添加一个新的因子变量day_cat,其伪代码可能是这样的:

sample_dates <- sample_dates %>% 
    # the month could start on any day and this function should identify it
    # for the sample, I know January 2021 started on Friday
    
    mutate(day_cat = while(month is not over)
        
        if(day == "Fri") {"Fri1"},
        else if(day == "Sat" | day == "Sun") {"Weekend1"},
        else if(day == "Mon") {"Mon1"},
        else if(day == "Tue" | day == "Wed" | day == "Thu") {"Weekdays1"},
        
        # now we're onto the next Friday of the month
        else if(day == "Fri") {"Fri2"},
        else if(day == "Sat" | day == "Sun") {"Weekend2"},
        else if(day == "Mon") {"Mon2"},
        else if(day == "Tue" | day == "Wed" | day == "Thu") {"Weekdays2"},
        ...
        ...
        
        # reached the end of month
        )

    mutate(day_cat = factor(day_cat, levels = c("Mon", "Weekdays", "Fri", "Weekend")))

所以,因子是 Mon = {Mon};工作日 = {周二、周三、周四};周五 = {周五};周末 = {周六、周日}。而且,我想在 day_cat 变量中将这些因素编号为 Mon1、Weekdays1、Fri1、Weekend1、Mon2、Weekdays2、Fri1、Weekend2、Mon3 等等(假设月份从星期一开始)。

day_cat 变量的级别应该是相同的顺序(用于绘图目的)。

如果一个月从星期三开始,day_cat 将只将该星期三和星期四(第二天)作为“Weekdays1”。如果该月在星期六结束,day_cat 将只将该星期六作为“Weekend4”或“Weekend5”,无论我可能是哪个。

【问题讨论】:

    标签: r function date dplyr timestamp


    【解决方案1】:

    这里,day_cat 是按时间顺序排列的因子,尽管指定的三个工作日和两个周末值每周将具有相同的因子水平。这就是你想要的吗?

    library(dplyr); library(lubridate)
    sample_dates %>%
      mutate(day = wday(date, label = TRUE),
             group = case_when(day == "Mon" ~ "Mon",
                               day == "Fri" ~ "Fri",
                               day %in% c("Sat", "Sun") ~ "Weekend",
                               TRUE ~ "Weekday"),
             weeknum = (day(date)-1) %/% 7 + 1,
             day_cat = paste0(group, weeknum) %>% fct_inorder()) 
    

    结果

             date day   group weeknum  day_cat
    1  2021-01-01 Fri     Fri       1     Fri1
    2  2021-01-02 Sat Weekend       1 Weekend1
    3  2021-01-03 Sun Weekend       1 Weekend1
    4  2021-01-04 Mon     Mon       1     Mon1
    5  2021-01-05 Tue Weekday       1 Weekday1
    6  2021-01-06 Wed Weekday       1 Weekday1
    7  2021-01-07 Thu Weekday       1 Weekday1
    8  2021-01-08 Fri     Fri       2     Fri2
    9  2021-01-09 Sat Weekend       2 Weekend2
    10 2021-01-10 Sun Weekend       2 Weekend2
    11 2021-01-11 Mon     Mon       2     Mon2
    12 2021-01-12 Tue Weekday       2 Weekday2
    13 2021-01-13 Wed Weekday       2 Weekday2
    14 2021-01-14 Thu Weekday       2 Weekday2
    15 2021-01-15 Fri     Fri       3     Fri3
    16 2021-01-16 Sat Weekend       3 Weekend3
    17 2021-01-17 Sun Weekend       3 Weekend3
    18 2021-01-18 Mon     Mon       3     Mon3
    19 2021-01-19 Tue Weekday       3 Weekday3
    20 2021-01-20 Wed Weekday       3 Weekday3
    21 2021-01-21 Thu Weekday       3 Weekday3
    22 2021-01-22 Fri     Fri       4     Fri4
    23 2021-01-23 Sat Weekend       4 Weekend4
    24 2021-01-24 Sun Weekend       4 Weekend4
    25 2021-01-25 Mon     Mon       4     Mon4
    26 2021-01-26 Tue Weekday       4 Weekday4
    27 2021-01-27 Wed Weekday       4 Weekday4
    28 2021-01-28 Thu Weekday       4 Weekday4
    29 2021-01-29 Fri     Fri       5     Fri5
    30 2021-01-30 Sat Weekend       5 Weekend5
    31 2021-01-31 Sun Weekend       5 Weekend5
    

    【讨论】:

    • 我想要的有 18 个级别。然而,weeknum 的等效替代方案似乎不适用于 floor。第一个星期四的 weeknum 为 2。
    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 2018-02-16
    • 2019-02-24
    • 1970-01-01
    • 1970-01-01
    • 2016-12-10
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多