【问题标题】:Return value for minimum in group of columns [duplicate]列组中最小值的返回值[重复]
【发布时间】:2017-02-07 00:17:42
【问题描述】:

我有一组列,我需要一个新列min123,其中这些列的最小值为123a_1123a_5

dff <- structure(list(`MCI ID` = c("070405344", "230349820", "260386435","370390587", "380406805", "391169282", "440377986", "750391394","890373764", "910367024"), 
                      `123a_1` = structure(c(16672, 16372,16730, 16688, 16700, 16783, 16709, 17033, 16786, 16675), class = "Date"),
                      `123a_2` = structure(c(17029, 16422, 17088, 17036, 17057,17140, 17072, 17043, 17141, 17038), class = "Date"), 
                      `123a_3` = structure(c(NA_real_,NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,NA_real_, NA_real_, NA_real_), class = "Date"), 
                      `123a_4` = structure(c(NA_real_,NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,NA_real_, NA_real_, NA_real_), class = "Date"), 
                      `123a_5` = structure(c(NA_real_,NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,NA_real_, NA_real_, NA_real_), class = "Date")), 
         .Names = c("MCI ID","123a_1", "123a_2", "123a_3", "123a_4", "123a_5"), row.names = c(NA,10L), class = "data.frame") 

【问题讨论】:

    标签: r


    【解决方案1】:

    使用do.callpmin 的基本R 方法:

    dff$min123 <- do.call(pmin, c(dff[-1], na.rm = TRUE))
    

    dplyr 中的类似方法:

    library(dplyr)    
    dff %>% 
      mutate(min123 = do.call(pmin, c(select(., -1), na.rm = TRUE)))
    

    data.table:

    library(data.table)
    setDT(dff)[, min123 := do.call(pmin, c(.SD, na.rm = TRUE)), .SDcols = -1]
    

    【讨论】:

    • 哈哈...我只是想看看如何在do.call 中传递na.rm :)
    【解决方案2】:
    library(dplyr)    
    dff %>% 
        mutate(min123 = pmin(`123a_1`, `123a_2`, `123a_3`, `123a_4`, `123a_5`, na.rm = T))
    

    【讨论】:

    • 效果很好,其次,有没有办法将下一个返回到最小值?
    【解决方案3】:

    这就是函数pmin的用途:

    > str(dff)
    'data.frame':   10 obs. of  6 variables:
     $ MCI ID: chr  "070405344" "230349820" "260386435" "370390587" ...
     $ 123a_1: Date, format: "2015-08-25" "2014-10-29" "2015-10-22" ...
     $ 123a_2: Date, format: "2016-08-16" "2014-12-18" "2016-10-14" ...
     $ 123a_3: Date, format: NA NA NA ...
     $ 123a_4: Date, format: NA NA NA ...
     $ 123a_5: Date, format: NA NA NA ...
    > dff$groupmin <- pmin(dff[[2]],dff[[3]],dff[[4]], dff[[5]], dff[[6]], na.rm=TRUE)
    > head(dff)
         MCI ID     123a_1     123a_2 123a_3 123a_4 123a_5   groupmin
    1 070405344 2015-08-25 2016-08-16   <NA>   <NA>   <NA> 2015-08-25
    2 230349820 2014-10-29 2014-12-18   <NA>   <NA>   <NA> 2014-10-29
    3 260386435 2015-10-22 2016-10-14   <NA>   <NA>   <NA> 2015-10-22
    4 370390587 2015-09-10 2016-08-23   <NA>   <NA>   <NA> 2015-09-10
    5 380406805 2015-09-22 2016-09-13   <NA>   <NA>   <NA> 2015-09-22
    6 391169282 2015-12-14 2016-12-05   <NA>   <NA>   <NA> 2015-12-14
    

    【讨论】:

    • 这和dplyr的方案完全一样
    • @Sotos 对于正在为这样一个简单的问题而苦苦挣扎的人,我给出了一个简单的答案。通过管道使用非标准库和非标准语法引入的额外复杂性似乎足以证明另一个答案是不利的。没关系,因为 Jaap 有迄今为止最好的答案:标准 R 和简洁! (有我的赞成票)
    • 好的。如果你认为它有价值,那么无论如何。
    • @Sotos 是的,我认为教人们如何在标准 R 中做事很有价值。我的帖子在 Jaap 的帖子之后或“已经有答案”链接之后没有更多价值。
    • 好的。没问题。由你决定。我只是澄清
    猜你喜欢
    • 2021-11-11
    • 2020-05-24
    • 2021-12-27
    • 2021-04-16
    • 2012-05-07
    • 1970-01-01
    • 2017-07-23
    • 2020-06-07
    • 2021-05-31
    相关资源
    最近更新 更多