根据几个条件更改变量的优雅方法？答案

【问题标题】：Elegant way to change a variable according to several conditions?根据几个条件更改变量的优雅方法？
【发布时间】：2021-05-04 01:55:49
【问题描述】：

我正在尝试根据多个条件更改变量“曝光”。

例如：如果stimulus_content是“neg”，如果condition是“neg”，如果set是“A”，那么变量“exposure”的内容应该改为“ long" 用于stimulus_no 为 X1、X2、... 或 X5 的行。对于stimulus_no 为X6、X7、...或X10的行，变量“exposure”应更改为“short”。依此类推...

我希望下面的代码能让问题更清楚。

首先，这是大概的数据集：

n <- 6
dataset <- data.frame(
participant = rep(1:n, each=40),
condition = rep(c("pos","neg"), each=40),
set = rep(c("A","B"), each=40),
stimulus_content = rep(c("pos","neg"), each=2),
stimulus_no = rep(c("X1","X10","X11","X12","X13","X14","X15","X16","X17","X18","X19","X2","X20","X3","X4","X5","X6","X7","X8","X9"), each=2),
exposure = NA)

我们尝试的第一件事是通过循环。为简单起见，仅包含循环的一部分。它不返回错误，但也不执行任何操作。

for (i in 1:length(longdat[,1])){
  if(longdat[i,"stimulus_content"] == "pos") { 
    if(longdat[i,"condition"] == "pos") {
      if(longdat[i,"set"] == "A") {     
        for(stimulus_no in c("X1","X2","X3","X4","X5")){longdat[i,"exposure"] == "long"}
        for(stimulus_no in c("X6","X7","X8","X9","X10")){longdat[i,"exposure"] == "short"}
        for(stimulus_no in c("X11","X12","X13","X14","X15","X16","X17","X18","X19","X20")){longdat[i,"exposure"] == "none"}
      } else { #for condition = pos and set != A            
        for(stimulus_no in c("X11","X12","X13","X14","X15")){longdat[i,"exposure"] == "long"}
        for(stimulus_no in c("X16","X17","X18","X19","X20")){longdat[i,"exposure"] == "short"}
        for(stimulus_no in c("X1","X2","X3","X4","X5","X6","X7","X8","X9","X10")){longdat[i,"exposure"] == "none"}
      }
    }
  }
}

接下来，我们尝试通过 mutate 和 case_when。这段代码完全符合它的预期，但它几乎有 100 行长！请在下面找到摘录。

longdat2 <- longdat %>%
  mutate(exposure = case_when(
    # Condition pos, set A
    stimulus_no=="X1" & stimulus_content=="pos" & condition=="pos" & set=="A" ~ "long",
    stimulus_no=="X2" & stimulus_content=="pos" & condition=="pos" & set=="A" ~ "long",
    # ...
    stimulus_no=="X9" & stimulus_content=="pos" & condition=="pos" & set=="A" ~ "short",
    stimulus_no=="X10" & stimulus_content=="pos" & condition=="pos" & set=="A" ~ "short",
    stimulus_no=="X11" & stimulus_content=="pos" & condition=="pos" & set=="A" ~ "none",
    # ... accordingly for condition pos and set B, and for condition neg and set A
    # and eventually for condition neg and set B
    stimulus_no=="X18" & stimulus_content=="neg" & condition=="neg" & set=="B" ~ "short",
    stimulus_no=="X19" & stimulus_content=="neg" & condition=="neg" & set=="B" ~ "short",
    stimulus_no=="X20" & stimulus_content=="neg" & condition=="neg" & set=="B" ~ "short",
  )
)

如果有人设法发现循环中的错误或者可以告诉我第二个（或第一个）选项的更简洁版本，我将非常感激！

提前非常感谢！

【问题讨论】：

这是什么语言？ ?作为标签真的很有帮助。比loops 和for-loop 或if-statement 和conditional-statements 都更有帮助。
for(stimulus_no in c("X1","X2","X3","X4","X5")){longdat[i,"exposure"] == "long"} 这看起来不对。这不应该是longdat[i, "exposure"] = "long" 或longdat[i, "exposure"] <- "long"（赋值，而不是比较）吗？
另外，循环体不使用循环变量（stimulus_no）
我不确定循环应该做什么。如果它应该根据一组值检查一个值，则需要在条件中使用 contains 函数或 in 运算符，而不是循环来多次执行其主体。
第一个解决方案中有两级循环。正如@knittl 所说，您应该用带有in 运算符的条件替换第二级。您还应该将：stimulus_no 替换为 longdat[i,"stimulus_no"]：if(longdat[i,"stimulus_no"] %in% c("X1","X2","X3","X4","X5")){longdat[i,"exposure"] <- "long"}

标签： r loops for-loop if-statement case-when

【解决方案1】：

1) grep 创建一个code，将要匹配的列粘贴在一起，然后使用正则表达式匹配它们以获得简洁的表达式。不使用任何包。请注意，[^A] 将匹配任何不是 A 的单个字符。如果您只有 A 和 B，则可以使用 B 代替。 X1[1-5] 将匹配 X11, ..., X15。 X[6-9]|X10 将匹配 X6, ..., X10。 $ 匹配字符串的结尾。如果要保留 code 列，请省略 code <- NULL 行。

dataset2 <- within(dataset, {
  code <- paste(stimulus_content, condition, set, stimulus_no)
  exposure[grep("pos pos A X[1-5]$", code)] <- "long"
  exposure[grep("pos pos A (X[6-9]|X10)$", code)] <- "short"
  exposure[grep("pos pos A (X1[1-9]|X20)$", code)] <- "none"
  exposure[grep("pos pos [^A] X1[1-5]$", code)] <- "long"
  exposure[grep("pos pos [^A] (X1[6-9]|X20)$", code)] <- "short"
  exposure[grep("pos pos [^A] (X[1-9]|X10)$", code)] <- "none"
  code <- NULL
})

2) 之间 另一种方法，同样只使用基数 R，是定义一个 Between 函数，该函数分别检查其第一个参数的非数字和数字部分，假设数字部分必须在指定的范围内，并且非数字部分等于第四个参数（默认为"X"，因此为简洁起见，我们可以在调用中省略它）。然后使用within如图：

Between <- function(x, lo, hi, alpha = "X") {
  nonno <- gsub("\\d", "", x)
  no = as.numeric(gsub("\\D", "", x))
  no >= lo & no <= hi & nonno == alpha
}

dataset3 <- within(dataset, {

  cond1 <- stimulus_content == "pos" & condition == "pos" & set == "A"
  exposure[cond1 & Between(stimulus_no, 1, 5)] <- "long"
  exposure[cond1 & Between(stimulus_no, 6, 10)] <- "short"
  exposure[cond1 & Between(stimulus_no, 11, 20)] <- "none"

  cond2 <- stimulus_content == "pos" & condition == "pos" & set != "A"
  exposure[cond2 & Between(stimulus_no, 11, 15)] <- "long"
  exposure[cond2 & Between(stimulus_no, 16, 20)] <- "short"
  exposure[cond2 & Between(stimulus_no, 1, 10)] <- "none"

  cond1 <- cond2 <- NULL
})

【讨论】：

感谢您的编辑，感谢您抽出宝贵时间回答我的问题！我不知道 between 函数！

【解决方案2】：

您可以使用%in% 运算符和else 部分的逆条件来简化您的第二个解决方案：

dataset2 <- dataset %>%
  mutate(exposure = case_when(
    # Condition pos, set A
    (stimulus_content=="pos" & condition=="pos" & set=="A") & stimulus_no %in% c("X1","X2","X3","X4","X5") ~ "long",
    (stimulus_content=="pos" & condition=="pos" & set=="A") & stimulus_no %in% c("X6","X7","X8","X9","X10") ~ "short",
    (stimulus_content=="pos" & condition=="pos" & set=="A") & stimulus_no %in% c("X11","X12","X13","X14","X15","X16","X17","X18","X19","X20") ~ "none",
    # else
    !(stimulus_content=="pos" & condition=="pos" & set=="A") & stimulus_no %in% c("X11","X12","X13","X14","X15") ~ "long",
    !(stimulus_content=="pos" & condition=="pos" & set=="A") & stimulus_no %in% c("X16","X17","X18","X19","X20") ~ "short",
    !(stimulus_content=="pos" & condition=="pos" & set=="A") & stimulus_no %in% c("X1","X2","X3","X4","X5","X6","X7","X8","X9","X10") ~ "none"
  )
)

编辑

对于带循环的解决方案：

dataset3 <- dataset
for (i in 1:length(dataset3[,1])){
  if(dataset3[i,"stimulus_content"] == "pos" & dataset3[i,"condition"] == "pos" & dataset3[i,"set"] == "A") {    
    if(dataset3[i,"stimulus_no"] %in% c("X1","X2","X3","X4","X5")) {dataset3[i,"exposure"] <- "long"}
    if(dataset3[i,"stimulus_no"] %in% c("X6","X7","X8","X9","X10")) {dataset3[i,"exposure"] <- "short"}
    if(dataset3[i,"stimulus_no"] %in% c("X11","X12","X13","X14","X15","X16","X17","X18","X19","X20")){dataset3[i,"exposure"] <- "none"}
  } else {       
    if(dataset3[i,"stimulus_no"] %in% c("X11","X12","X13","X14","X15")) {dataset3[i,"exposure"] <- "long"}
    if(dataset3[i,"stimulus_no"] %in% c("X16","X17","X18","X19","X20")) {dataset3[i,"exposure"] <- "short"}
    if(dataset3[i,"stimulus_no"] %in% c("X1","X2","X3","X4","X5","X6","X7","X8","X9","X10")) {dataset3[i,"exposure"] <- "none"}
  }
}

compareDF::compare_df(dataset3, dataset2, rownames)
#> Error in stop_or_warn("The two data frames are the same!", stop_on_error): The two data frames are the same!

并避免循环，例如@g-grothendieck，但更接近您的代码：

dataset4 <- within(dataset, {
  # Condition pos, set A
  exposure[(stimulus_content == "pos" & condition == "pos" & set == "A") & stimulus_no %in% c("X1","X2","X3","X4","X5")] <- "long"
  exposure[(stimulus_content == "pos" & condition == "pos" & set == "A") & stimulus_no %in% c("X6","X7","X8","X9","X10")] <- "short"
  exposure[(stimulus_content == "pos" & condition == "pos" & set == "A") & stimulus_no %in% c("X11","X12","X13","X14","X15","X16","X17","X18","X19","X20")] <- "none"
  
  # else     
  exposure[!(stimulus_content == "pos" & condition == "pos" & set == "A") & stimulus_no %in% c("X11","X12","X13","X14","X15")] <- "long"
  exposure[!(stimulus_content == "pos" & condition == "pos" & set == "A") & stimulus_no %in% c("X16","X17","X18","X19","X20")] <- "short"
  exposure[!(stimulus_content == "pos" & condition == "pos" & set == "A") & stimulus_no %in% c("X1","X2","X3","X4","X5","X6","X7","X8","X9","X10")] <- "none"
})

compareDF::compare_df(dataset4, dataset2, rownames)
#> Error in stop_or_warn("The two data frames are the same!", stop_on_error): The two data frames are the same!

或

dataset5 <- within(dataset, {
  # Condition pos, set A
  exposure <- ifelse((stimulus_content == "pos" & condition == "pos" & set == "A") & stimulus_no %in% c("X1","X2","X3","X4","X5"), "long", exposure)
  exposure <- ifelse((stimulus_content == "pos" & condition == "pos" & set == "A") & stimulus_no %in% c("X6","X7","X8","X9","X10"), "short", exposure)
  exposure <- ifelse((stimulus_content == "pos" & condition == "pos" & set == "A") & stimulus_no %in% c("X11","X12","X13","X14","X15","X16","X17","X18","X19","X20"), "none", exposure)
  
  # else     
  exposure <- ifelse(!(stimulus_content == "pos" & condition == "pos" & set == "A") & stimulus_no %in% c("X11","X12","X13","X14","X15"), "long", exposure)
  exposure <- ifelse(!(stimulus_content == "pos" & condition == "pos" & set == "A") & stimulus_no %in% c("X16","X17","X18","X19","X20"), "short", exposure)
  exposure <- ifelse(!(stimulus_content == "pos" & condition == "pos" & set == "A") & stimulus_no %in% c("X1","X2","X3","X4","X5","X6","X7","X8","X9","X10"), "none", exposure)
})

compareDF::compare_df(dataset5, dataset2, rownames)
#> Error in stop_or_warn("The two data frames are the same!", stop_on_error): The two data frames are the same!

问候，

【讨论】：

感谢您提供的编辑！感谢您，我学到了很多东西！感谢您抽出宝贵时间！