R替换每组的最小日期值答案

【问题标题】：R replace minimum date values per groupR替换每组的最小日期值
【发布时间】：2021-06-25 15:16:39
【问题描述】：

我有一个 df，对不同的组进行了一年的观察。然而，每组的第一次观察日期可能略有不同（通常在一年的第一天）。我打算在一个线图中显示这些组，我希望它们都从“2021-01-01”开始。

如何重新编码我的日期变量，以便将每组的第一次出现 (min(Date)?) 设置为“2021-01-01”？

这是一个小子集，其中 X、Y、Z 具有不同的开始日期。谢谢！

structure(list(Date = structure(c(18637, 18644, 18651, 18658, 
18665, 18672, 18679, 18686, 18693, 18700, 18707, 18714, 18721, 
18728, 18735, 18636, 18643, 18651, 18656, 18665, 18672, 18676, 
18686, 18693, 18700, 18707, 18714, 18720, 18727, 18735, 18635, 
18643, 18649, 18658, 18662, 18670, 18677, 18684, 18692, 18700, 
18707, 18713, 18718, 18728, 18735), class = "Date"), Maand = structure(c(1L, 
1L, 1L, 1L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 1L, 1L, 
1L, 1L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 1L, 1L, 1L, 
1L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 4L, 4L, 4L), .Label = c("jan", 
"feb", "mrt", "apr", "mei", "jun", "jul", "aug", "sep", "okt", 
"nov", "dec"), class = c("ordered", "factor")), UPV2 = c(339L, 
69L, 59L, 48L, 77L, 95L, 54L, 61L, 99L, 95L, 67L, 71L, 54L, 98L, 
98L, 8L, 6L, 11L, 7L, 15L, 7L, 5L, 4L, 22L, 13L, 4L, 5L, 14L, 
14L, 7L, 6L, 7L, 8L, 13L, 2L, 9L, 9L, 13L, 4L, 9L, 8L, 8L, 4L, 
14L, 4L), VAR = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 
3L, 3L, 3L, 3L), .Label = c("X", "Y", "Z"), class = "factor")), row.names = c(NA, 
-45L), groups = structure(list(VAR = structure(1:3, .Label = c("X", 
"Y", "Z"), class = "factor"), .rows = structure(list(1:15, 16:30, 
    31:45), ptype = integer(0), class = c("vctrs_list_of", "vctrs_vctr", 
"list"))), row.names = c(NA, -3L), class = c("tbl_df", "tbl", 
"data.frame"), .drop = TRUE), class = c("grouped_df", "tbl_df", 
"tbl", "data.frame"))

【问题讨论】：

刚刚修改了我的答案以处理 Maand 中的自定义月份缩写。

标签： r

【解决方案1】：

这个带有dplyr（和lubridate）的解决方案将针对每个组的最小Date，并将其替换为您的共同开始日期DEFAULT_DATE。截至我最近的revision，它还将更新Maand中的自定义月份缩写。

library(dplyr)
library(lubridate)

# ...
# Code to generate your data.frame "df".
# ...

DEFAULT_DATE <- as.Date("2021-01-01")

df <- df %>%
  group_by(VAR) %>%
  mutate(# Update the custom month abbreviation for every "min(Date)" in each group.
         Maand = if_else(Date == min(Date),
                         # Pick out the corresponding level of the factor.
                         ordered(levels(Maand)[month(DEFAULT_DATE)], levels = levels(Maand)),
                         Maand),
         # Replace every "min(Date)" in each group.
         Date = if_else(Date == min(Date), DEFAULT_DATE, Date)) %>%
  ungroup()

请记住，这里的大部分复杂情况来自您对月份名称的自定义缩写，在 Maand 列中分解（按顺序）。

幸运的是，我修改后的解决方案解决了这一挑战。如果将一个新组"A" 添加到组合中，并且其最早的Date 是2021-03-07，那么它的Maand 将是您对“March”的自定义缩写，在这种情况下是"mrt"。应用我的转换时，该日期将更新为DEFAULT_DATE，在本例中为2021-01-01。此外，mutate() 还将确保 Maand 更新（此处为 "jan"）：到对应于月份的因子级别（此处为 1st 级别） DEFAULT_DATE（此处为一年中的第 1 个月）。

【讨论】：