将 R dplyr::mutate() 与 for 循环和动态变量一起使用答案

【问题标题】：Using R dplyr::mutate() with a for loop and dynamic variables将 R dplyr::mutate() 与 for 循环和动态变量一起使用
【发布时间】：2021-07-23 13:33:30
【问题描述】：

免责声明：我认为有一个更有效的解决方案（也许是一个带有列表的匿名函数或 *apply 函数？）因此我向您寻求帮助！

数据

假设我有一个 df，参与者对 3 个问题 As 和 3 个问题 Bs 的回答，例如

qa1, qa2, qa3, qb1, qb2, qb3   
1, 3, 1, 2, 4, 4  
1, 3, 2, 2, 1, 4  
2, 3, 1, 2, 1, 4  
1, 3, 2, 1, 1, 3

EDIT df 还包含具有其他不相关数据的其他列！

我有一个向量，其中每个 qa1-3 和 qb1-3 的正确答案与列按顺序排列。

correct_answer <- c(1,3,2,2,1,4)

（即对于 qa1,qa2,qa3,qb1,qb2,qb3）

所需的操作

我想为每个问题创建一个新列（例如 qa1_correct），根据将 df 中的每个响应与正确答案中的相应答案进行匹配，对参与者是正确回答 (1) 还是错误回答 (0) 进行编码。理想情况下，我会得到：

qa1, qa2, qa3, qb1, qb2, qb3, qa1_correct, qa2_correct, qa3_correct ...     
1, 3, 1, 2, 4, 4, 1, 1, 0, ...   
1, 3, 2, 2, 1, 4, 1, 1, 1, ...   
2, 3, 1, 2, 1, 4, 0, 1, 0, ...   
1, 3, 2, 1, 1, 3, 1, 1, 1, ...

尝试失败

这是我对问题 As only 的尝试（对于 Bs 会重复）但它不起作用（可能是错误的函数 paste0()？）：

index <- c(1:3)  
    

    for (i in index) {
    df <- df %>% mutate(paste0("qa",i,"_correct") = 
                               case_when(paste0("qa"i) == correct_answer[i] ~ 1, 
                                         paste0("qa"i) != correct_answer[i] ~ 0))
    }

非常感谢您的指导！

【问题讨论】：

没有mutate()的解决方案可以选择吗？

标签： r dplyr

【解决方案1】：

您可以组合mutate 和across。

代码 1：将 Correct_answer 作为向量

df  %>%
  mutate(across(everything(),
                ~as.numeric(.x == correct_answer[names(df) == cur_column()]),
                .names = "{.col}_correct"))

代码 2：将 Correct_answer 作为 data.frame (df_correct)

correct_answer <- c(1,3,2,2,1,4) 
df_correct <- data.frame(
  matrix(correct_answer, ncol = length(correct_answer))
)
colnames(df_correct) <- names(df)

df  %>%
  mutate(across(everything(),
                .fn = ~as.numeric(.x == df_correct[,cur_column()]),
                .names = "{.col}_correct"))

输出

  qa1 qa2 qa3 qb1 qb2 qb3 qa1_correct qa2_correct qa3_correct qb1_correct qb2_correct qb3_correct
1   1   3   1   2   4   4           1           1           0           1           0           1
2   1   3   2   2   1   4           1           1           1           1           1           1
3   2   3   1   2   1   4           0           1           0           1           1           1
4   1   3   2   1   1   3           1           1           1           0           1           0

【讨论】：

谢谢！如果我有其他列具有完全不同的命名变量，我可以用例如替换everything()选择（starts_with（“q”））？
您不需要select，只需将everything() 替换为starts_with("q")。 df %>% mutate(across(starts_with("qa"), ~as.numeric(.x == correct_answer[names(df) == cur_column()]), .names = "{.col}_correct"))

【解决方案2】：

这也可能是一种替代方法（在 R 版本 4.1.0 及更高版本中，应用获得一个新参数简化为默认 TRUE）

df <- read.table(header = T, text = 'qa1, qa2, qa3, qb1, qb2, qb3   
1, 3, 1, 2, 4, 4  
1, 3, 2, 2, 1, 4  
2, 3, 1, 2, 1, 4  
1, 3, 2, 1, 1, 3', sep = ',')

df
#>   qa1 qa2 qa3 qb1 qb2 qb3
#> 1   1   3   1   2   4   4
#> 2   1   3   2   2   1   4
#> 3   2   3   1   2   1   4
#> 4   1   3   2   1   1   3

correct_answer <- c(1,3,2,2,1,4)

cbind(df, 
      setNames(as.data.frame(t(apply(df, 1, 
                                     \(x) +(x == correct_answer)))), 
               paste0(names(df), '_correct')))
#>   qa1 qa2 qa3 qb1 qb2 qb3 qa1_correct qa2_correct qa3_correct qb1_correct
#> 1   1   3   1   2   4   4           1           1           0           1
#> 2   1   3   2   2   1   4           1           1           1           1
#> 3   2   3   1   2   1   4           0           1           0           1
#> 4   1   3   2   1   1   3           1           1           1           0
#>   qb2_correct qb3_correct
#> 1           0           1
#> 2           1           1
#> 3           1           1
#> 4           1           0

^{由reprex package (v2.0.0) 于 2021 年 7 月 23 日创建}

【讨论】：

【解决方案3】：

您也可以在base R中使用以下解决方案：

cbind(df, 
      do.call(cbind, mapply(function(x, y) as.data.frame({+(x == y)}), 
                            df, correct_answer, SIMPLIFY = FALSE)) |>
        setNames(paste0(names(df), "_corr")))

  qa1 qa2 qa3 qb1 qb2 qb3 qa1_corr qa2_corr qa3_corr qb1_corr qb2_corr qb3_corr
1   1   3   1   2   4   4        1        1        0        1        0        1
2   1   3   2   2   1   4        0        0        0        0        0        0
3   2   3   1   2   1   4        1        0        0        0        0        0
4   1   3   2   1   1   3        1        1        1        0        1        0

或者一个潜在的tidyverse 解决方案可能是：

library(tidyr)
library(purrr)

df %>%
  mutate(output = pmap(df, ~ setNames(+(c(...) == correct_answer), 
                                             paste0(names(df), "_corr")))) %>%
  unnest_wider(output)

  qa1 qa2 qa3 qb1 qb2 qb3 qa1_corr qa2_corr qa3_corr qb1_corr qb2_corr qb3_corr
1   1   3   1   2   4   4        1        1        0        1        0        1
2   1   3   2   2   1   4        0        0        0        0        0        0
3   2   3   1   2   1   4        1        0        0        0        0        0
4   1   3   2   1   1   3        1        1        1        0        1        0

【讨论】：

非常感谢！当 df 包含除 qa/qb 之外的其他列变量时，我该如何适应？

【解决方案4】：

试试这个：

df_new <- cbind(df, t(apply(df, 1, function(x) as.numeric(x == correct_answer))))

【讨论】：

不，这不起作用 - 只是生成了带有 0 的空白列
@CocoNewton，你使用的是哪个 R 版本？

【解决方案5】：

EDIT 可以添加 sym()
在这里找到了一个相关的解决方案Paste variable name in mutate (dplyr)，但它只粘贴了 0

for (i in index) {
df <- df %>% mutate( !!paste0("qa",i,"_correct") :=
case_when(!!sym(paste0("qa",i)) == correct_answer[i] ~ 1,
!!sym(paste0("qa",i)) != correct_answer[i] ~ 0))
}

【讨论】：