【问题标题】:Finding the min or max of POSIXct date with NA values查找具有 NA 值的 POSIXct 日期的最小值或最大值
【发布时间】:2018-08-16 22:43:50
【问题描述】:

下面的数据包含单个 ID 的列(带有重复观察),DateFate

         ID       Date  Fate
1  BHS_1149 2017-04-11   MIA
2  BHS_1154       <NA>  <NA>
3  BHS_1155       <NA>  <NA>
4  BHS_1156       <NA>  <NA>
5  BHS_1157       <NA>  Mort
6  BHS_1159 2017-04-11 Alive
7  BHS_1169 2017-04-11 Alive
8  BHS_1259       <NA>  <NA>
9  BHS_1260       <NA>  <NA>
10 BHS_1262 2017-04-11   MIA
11 BHS_1262 2017-07-05 Alive
12 BHS_1262 2017-12-06 Alive
13 BHS_1262 2017-12-06   MIA
14 BHS_1262 2018-01-17  Mort

对于每个 ID,当Fate 处于活动状态时,我想创建一个新列来表示最小 Date 或最大 Date。如果在下面的代码中包含和排除 na.rm = T 参数,我尝试了不同的组合,但仍然收到以下警告。

library(tidyverse)
library(lubridate)

dat %>% 
  group_by(ID) %>%
  mutate(
    #the first or min of Date
    FstSurvey = min(Date),
    LstAlive = max(Date[Fate == "Alive"])) %>%
  as.data.frame()

         ID       Date  Fate  FstSurvey   LstAlive
1  BHS_1149 2017-04-11   MIA 2017-04-11       <NA>
2  BHS_1154       <NA>  <NA>       <NA>       <NA>
3  BHS_1155       <NA>  <NA>       <NA>       <NA>
4  BHS_1156       <NA>  <NA>       <NA>       <NA>
5  BHS_1157       <NA>  Mort       <NA>       <NA>
6  BHS_1159 2017-04-11 Alive 2017-04-11 2017-04-11
7  BHS_1169 2017-04-11 Alive 2017-04-11 2017-04-11
8  BHS_1259       <NA>  <NA>       <NA>       <NA>
9  BHS_1260       <NA>  <NA>       <NA>       <NA>
10 BHS_1262 2017-04-11   MIA 2017-04-11 2017-12-06
11 BHS_1262 2017-07-05 Alive 2017-04-11 2017-12-06
12 BHS_1262 2017-12-06 Alive 2017-04-11 2017-12-06
13 BHS_1262 2017-12-06   MIA 2017-04-11 2017-12-06
14 BHS_1262 2018-01-17  Mort 2017-04-11 2017-12-06

Warning messages:
1: In max.default(numeric(0), na.rm = FALSE) :
  no non-missing arguments to max; returning -Inf
2: In max.default(numeric(0), na.rm = FALSE) :
  no non-missing arguments to max; returning -Inf

代码似乎按预期工作,但我无法解释或避免错误,也无法通过maxmin 帮助页面找到解决方案。可重现的代码包含在下面。

dat <- structure(list(ID = c("BHS_1149", "BHS_1154", "BHS_1155", "BHS_1156", 
"BHS_1157", "BHS_1159", "BHS_1169", "BHS_1259", "BHS_1260", "BHS_1262", 
"BHS_1262", "BHS_1262", "BHS_1262", "BHS_1262"), Date = structure(c(1491890400, 
NA, NA, NA, NA, 1491890400, 1491890400, NA, NA, 1491890400, 1499234400, 
1512543600, 1512543600, 1516172400), class = c("POSIXct", "POSIXt"
), tzone = ""), Fate = c("MIA", NA, NA, NA, "Mort", "Alive", 
"Alive", NA, NA, "MIA", "Alive", "Alive", "MIA", "Mort")), row.names = c(NA, 
-14L), .Names = c("ID", "Date", "Fate"), class = "data.frame")

【问题讨论】:

  • 基本上,如果有没有任何行的 ID 带有 Fate == "Alive",则没有可用于 LstAlive 的日期,而是返回 &lt;NA&gt;(当 ID 没有日期时相同对于FstSurvey)。不过,我不明白您为什么要担心这些警告。

标签: r dplyr max min lubridate


【解决方案1】:

我也喜欢编写不会出错的代码。这是关于如何在没有警告的情况下进行相同计算的建议。通过使用有序 firstlast 而不是 minma​​x 你不会得到 r 解释 max( NULL) 变为 Inf。

dat %>% 
  group_by(ID) %>%
  mutate(FstSurvey = first(Date, 
                     order_by = Date),
         LstAlive  = last(Date[Fate == "Alive"], 
                     order_by = Date[Fate == "Alive"]))

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 2020-09-19
    • 2021-08-25
    • 1970-01-01
    • 2020-06-20
    • 2018-04-24
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多