如何创建一个新列，其值取决于其他列中的值？答案

【问题标题】：How do I create a new column with values that depend on the values in other columns?如何创建一个新列，其值取决于其他列中的值？
【发布时间】：2020-01-24 01:10:14
【问题描述】：

这是一个非常简单的玩具数据集，用于说明我目前在使用另一个数据集时遇到的问题。

假设我们在数学测试中测试了 4 位参与者，他们每人回答了 4 个问题。其中 2 个问题很简单，其中 2 个问题很困难。但是问题是按随机顺序提出的，所以有些人从一个简单的问题开始，有些人从一个困难的问题开始。我们在这个实验中有一个二元响应变量，我们将答案分类为“正确”或“不正确”。

这是假数据：

my_matrix <- matrix(c(rep(1:4, each=4), rep(1:4, 4), rep(c("difficult", "easy"), times = 4), rep(c("easy", "difficult"), times = 4), rep(c("correct", "incorrect"), times = 8)), nrow=16, ncol=4, byrow = FALSE)

my_matrix

my_data_frame <- as.data.frame(my_matrix)

colnames(my_data_frame) <- c("Participant", "ItemNumber", "QuestionDifficulty", "Answer")

my_data_frame$Participant <- as.numeric(my_data_frame$Participant)

my_data_frame

现在，我想创建一个新列，使其值对于从困难问题开始的人来说是“DifficultFirst”，对于从简单问题开始的人来说是“EasyFirst”。我为此尝试了以下代码。

for (i in 1:16) {
  ifelse(my_data_frame$Participant == i & my_data_frame$ItemNumber == 1 & my_data_frame$QuestionDifficulty =="difficult",
         my_data_frame$FirstQuestion[((i*4)-3):(i*4)] <- "DifficultFirst",
         my_data_frame$FirstQuestion[((i*4)-3):(i*4)] <- "EasyFirst")}

但它没有用。具体来说，我收到一条关于替换的错误消息，并且数据的行号不匹配，我不知道为什么会这样。

时间已经很晚了，我的大脑可能太累了，如果这是一个愚蠢的问题，请道歉。但任何帮助将不胜感激。谢谢！

【问题讨论】：

标签： r dataframe if-statement

【解决方案1】：

你不需要循环，你可以使用各种分组操作。这将数据按Participant 和ItemNumber 排列，按Participant 分组，得到QuestionDifficulty 的第一个值。

library(dplyr)

my_data_frame %>%
  arrange(Participant, ItemNumber) %>%
  group_by(Participant) %>%
  mutate(FirstQuestion = paste0(first(QuestionDifficulty), "first"))

#   Participant ItemNumber QuestionDifficulty Answer    FirstQuestion 
#         <dbl> <fct>      <fct>              <fct>     <chr>         
# 1           1 1          difficult          correct   difficultfirst
# 2           1 2          easy               incorrect difficultfirst
# 3           1 3          difficult          correct   difficultfirst
# 4           1 4          easy               incorrect difficultfirst
# 5           2 1          difficult          correct   difficultfirst
# 6           2 2          easy               incorrect difficultfirst
# 7           2 3          difficult          correct   difficultfirst
# 8           2 4          easy               incorrect difficultfirst
# 9           3 1          easy               correct   easyfirst     
#10           3 2          difficult          incorrect easyfirst     
#11           3 3          easy               correct   easyfirst     
#12           3 4          difficult          incorrect easyfirst     
#13           4 1          easy               correct   easyfirst     
#14           4 2          difficult          incorrect easyfirst     
#15           4 3          easy               correct   easyfirst     
#16           4 4          difficult          incorrect easyfirst

【讨论】：

哇，非常感谢！只是出于好奇，你认为有没有办法用循环来解决这个问题？
是的，会有，但在这种情况下并不理想，因为使用for 循环会非常低效。