【问题标题】:R replace minimum date values per groupR替换每组的最小日期值
【发布时间】:2021-06-25 15:16:39
【问题描述】:

我有一个 df,对不同的组进行了一年的观察。然而,每组的第一次观察日期可能略有不同(通常在一年的第一天)。我打算在一个线图中显示这些组,我希望它们都从“2021-01-01”开始。

如何重新编码我的日期变量,以便将每组的第一次出现 (min(Date)?) 设置为“2021-01-01”?

这是一个小子集,其中 X、Y、Z 具有不同的开始日期。谢谢!

structure(list(Date = structure(c(18637, 18644, 18651, 18658, 
18665, 18672, 18679, 18686, 18693, 18700, 18707, 18714, 18721, 
18728, 18735, 18636, 18643, 18651, 18656, 18665, 18672, 18676, 
18686, 18693, 18700, 18707, 18714, 18720, 18727, 18735, 18635, 
18643, 18649, 18658, 18662, 18670, 18677, 18684, 18692, 18700, 
18707, 18713, 18718, 18728, 18735), class = "Date"), Maand = structure(c(1L, 
1L, 1L, 1L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 1L, 1L, 
1L, 1L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 1L, 1L, 1L, 
1L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 4L, 4L, 4L), .Label = c("jan", 
"feb", "mrt", "apr", "mei", "jun", "jul", "aug", "sep", "okt", 
"nov", "dec"), class = c("ordered", "factor")), UPV2 = c(339L, 
69L, 59L, 48L, 77L, 95L, 54L, 61L, 99L, 95L, 67L, 71L, 54L, 98L, 
98L, 8L, 6L, 11L, 7L, 15L, 7L, 5L, 4L, 22L, 13L, 4L, 5L, 14L, 
14L, 7L, 6L, 7L, 8L, 13L, 2L, 9L, 9L, 13L, 4L, 9L, 8L, 8L, 4L, 
14L, 4L), VAR = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 
3L, 3L, 3L, 3L), .Label = c("X", "Y", "Z"), class = "factor")), row.names = c(NA, 
-45L), groups = structure(list(VAR = structure(1:3, .Label = c("X", 
"Y", "Z"), class = "factor"), .rows = structure(list(1:15, 16:30, 
    31:45), ptype = integer(0), class = c("vctrs_list_of", "vctrs_vctr", 
"list"))), row.names = c(NA, -3L), class = c("tbl_df", "tbl", 
"data.frame"), .drop = TRUE), class = c("grouped_df", "tbl_df", 
"tbl", "data.frame"))

【问题讨论】:

  • 刚刚修改了我的答案以处理 Maand 中的自定义月份缩写。

标签: r


【解决方案1】:

这个带有dplyr(和lubridate)的解决方案将针对每个组的最小Date,并将其替换为您的共同开始日期DEFAULT_DATE。截至我最近的revision,它还将更新Maand中的自定义月份缩写。

library(dplyr)
library(lubridate)

# ...
# Code to generate your data.frame "df".
# ...

DEFAULT_DATE <- as.Date("2021-01-01")

df <- df %>%
  group_by(VAR) %>%
  mutate(# Update the custom month abbreviation for every "min(Date)" in each group.
         Maand = if_else(Date == min(Date),
                         # Pick out the corresponding level of the factor.
                         ordered(levels(Maand)[month(DEFAULT_DATE)], levels = levels(Maand)),
                         Maand),
         # Replace every "min(Date)" in each group.
         Date = if_else(Date == min(Date), DEFAULT_DATE, Date)) %>%
  ungroup()

请记住,这里的大部分复杂情况来自您对月份名称的自定义缩写,在 Maand 列中分解(按顺序)。

幸运的是,我修改后的解决方案解决了这一挑战。如果将一个新组"A" 添加到组合中,并且其最早的Date2021-03-07,那么它的Maand 将是您对“March”的自定义缩写,在这种情况下是"mrt"。应用我的转换时,该日期将更新为DEFAULT_DATE,在本例中为2021-01-01。此外,mutate() 还将确保 Maand 更新(此处为 "jan"):到对应于月份的因子级别(此处为 1st 级别) DEFAULT_DATE(此处为一年中的第 1 个月)。

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 2016-09-27
    • 2021-06-13
    • 1970-01-01
    • 2014-09-23
    • 2011-10-31
    • 2018-06-08
    • 1970-01-01
    • 2020-04-07
    相关资源
    最近更新 更多