【问题标题】:creating new variables from multiple variable using mutate() and across() in dplyr 1.0.0在 dplyr 1.0.0 中使用 mutate() 和 cross() 从多个变量创建新变量
【发布时间】:2021-07-06 02:52:00
【问题描述】:

我需要以相同的方式将多个具有相同前缀的列全部变异为新列。

这是玩具数据

df <- data.frame(su_1 = round(rnorm(12),2),
                 su_2 = round(rnorm(12),2),
                 su_3 = round(rnorm(12),2))

现在说我想将每个变量的连续值分类到离散的 bin 中。我可以像这样对每一列使用三个单独的类似步骤来做到这一点

df %>% mutate(su_1_disc = ifelse(su_1 < 0, "less", 
                                 ifelse(su_1 > 0 & su_1 <= 0.5, "mid", "lots"))) -> df

df %>% mutate(su_2_disc = ifelse(su_2 < 0, "less", 
                                 ifelse(su_2 > 0 & su_2 <= 0.5, "mid", "lots"))) -> df

df %>% mutate(su_3_disc = ifelse(su_3 < 0, "less", 
                                 ifelse(su_3 > 0 & su_3 <= 0.5, "mid", "lots"))) -> df

df

# output
#     su_1  su_2  su_3 su_1_disc su_2_disc su_3_disc
# 1   1.99  0.77 -0.17      lots      lots      less
# 2   0.51 -0.76 -1.24      lots      less      less
# 3   1.50 -0.36  0.28      lots      less       mid
# 4   0.86  0.88 -0.52      lots      lots      less
# 5   0.08  0.63 -0.76       mid      lots      less
# 6  -0.51 -0.99  0.01      less      less       mid
# 7   0.35  1.59  0.19       mid      lots       mid
# 8   0.16  0.35  0.38       mid       mid       mid
# 9  -0.75 -0.45  1.75      less      less      lots
# 10  0.97  0.62 -0.05      lots      lots      less
# 11 -0.07  0.47 -0.24      less       mid      less
# 12  0.61 -0.27 -1.55      lots      less      less

但我想使用新的 dplyr 1.0.0 功能一步完成

我试过了

df %>%
  mutate(across(starts_with("su_"),
                ifelse(.x < 0, "less", 
                       ifelse(.x > 0 & .x <= 0.5, "mid", "lots"))))

但是它抛出了一个错误。我知道.names 需要进入某个地方,但我有点迷路了。

【问题讨论】:

    标签: r dplyr


    【解决方案1】:

    你可以使用 -

    library(dplyr)
    
    df %>%
      mutate(across(starts_with("su_"),~ifelse(.x < 0, "less", 
             ifelse(.x > 0 & .x <= 0.5, "mid", "lots")), .names = '{col}_disc'))
    
    #    su_1  su_2  su_3 su_1_disc su_2_disc su_3_disc
    #1   0.40  0.57 -0.11       mid      lots      less
    #2   1.82 -0.55  0.44      lots      less       mid
    #3   0.44  1.47 -0.39       mid      lots      less
    #4  -0.82  0.00 -0.12      less      lots      less
    #5   0.17 -0.10 -1.55       mid      less      less
    #6   0.20  0.98 -1.02       mid      lots      less
    #7  -0.01  1.12 -0.30      less      lots      less
    #8  -0.70  0.31  0.35      less       mid       mid
    #9   0.46  1.18 -0.22       mid      lots      less
    #10 -1.09  0.03 -0.85      less       mid      less
    #11 -0.03  1.81  1.28      less      lots      lots
    #12 -0.11  1.64 -0.51      less      lots      less
    

    您还可以将ifelse 替换为case_whencut

    【讨论】:

    • 该死的我忘记了 ifelse 前面的~。我是如此接近。谢谢@Ronak Shah
    【解决方案2】:

    考虑使用case_when 而不是嵌套的ifelse

    library(dplyr)
    df %>%
        mutate(across(starts_with("su_"), ~ case_when(. < 0 ~ "less",
                  between(., 0, 0.5) ~ "mid", TRUE  ~ "lots"), 
            .names = "{.col}_disc"))
    

    【讨论】:

      猜你喜欢
      • 2020-07-05
      • 2018-08-13
      • 2016-03-11
      • 1970-01-01
      • 2018-10-03
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多