“条件的长度> 1，并且只使用第一个元素”来自数据帧上的嵌套“if elses”的警告答案

【问题标题】：“the condition has length > 1 and only the first element will be used” warning from nested `if elses' over a dataframe“条件的长度> 1，并且只使用第一个元素”来自数据帧上的嵌套“if elses”的警告
【发布时间】：2021-01-06 16:11:14
【问题描述】：

我有一个包含两列 Q10_headache_tibble 的数据框：

structure(list(df_questionaire.headaches = c(0L, 2L, 2L, 2L, 
0L, 0L, 0L, 0L, 2L, 0L, 2L, 2L, 0L, 2L, 0L, 2L, 2L, 2L, 2L, 2L, 
2L, 0L, 2L, 0L, 2L, 0L, 2L, NA, 2L, 2L, 0L, 2L, 0L, 2L, 2L, 0L, 
0L, 0L, 0L, 2L, 0L, 0L, 0L, 0L, 0L, 2L, 2L, 0L, 0L, 0L, 0L, 0L, 
0L, 2L, 0L, 2L, 2L, 0L, 0L, 2L, 0L, 0L, 0L, 0L, 2L, 0L, 2L, 0L, 
0L, 0L, 2L, 0L, 2L, 0L, 2L, 0L, 0L, 2L, 2L, 0L, 0L, 2L, 2L, 2L, 
0L, 0L, 0L, 0L, 2L, 0L, 2L, 0L, 0L, 0L, 0L, 2L, 0L, 2L, 2L, 2L, 
2L, 0L, 0L, 0L, 0L, 2L, 0L, 2L, 2L, 0L, 0L, 2L, 0L, 0L, 0L, 2L, 
0L, 2L, 2L, 0L, 0L, 2L, 0L, 2L, 2L, 0L, 2L, 2L, 2L, 2L, 0L, 0L, 
0L, 0L, 2L, 0L, 0L, 0L, 2L, 0L, 0L, 0L, 0L, 0L, 2L, 0L, 0L, 2L, 
2L, 0L, 0L, 0L, 2L, 0L, 0L, 0L, 0L, 0L, 2L, 2L, 0L, 2L, 0L, 0L, 
0L, 0L, 2L, 2L, 2L, 2L, 2L, 0L, 2L, 0L, 0L), df_questionaire.headaches_covid = c(0L, 
0L, 2L, 2L, 2L, 0L, 0L, 0L, 0L, 2L, 0L, 2L, 0L, 0L, 0L, 0L, 2L, 
2L, 2L, 2L, 2L, 0L, 2L, 0L, 2L, 2L, 0L, NA, 2L, 2L, 0L, 0L, 0L, 
2L, 2L, 0L, 0L, 0L, 0L, 2L, 0L, 0L, 0L, 0L, 2L, 2L, 0L, 0L, 0L, 
0L, 2L, 0L, 0L, 2L, 0L, 2L, 0L, 0L, 2L, 0L, 0L, 0L, 0L, 0L, 2L, 
0L, 0L, 774L, 0L, 0L, 0L, 2L, 2L, 774L, 0L, 0L, 0L, 2L, 0L, 2L, 
0L, 2L, 0L, 2L, 0L, 0L, 2L, 0L, 2L, 0L, 2L, 0L, 0L, 0L, 0L, 0L, 
0L, 2L, 2L, 0L, 2L, 0L, 2L, 2L, 0L, 2L, 0L, 0L, 2L, 0L, 0L, 2L, 
2L, 2L, 0L, 2L, 0L, 2L, 0L, 0L, 2L, 2L, 0L, 2L, 0L, 0L, 0L, 2L, 
2L, 0L, 0L, 0L, 0L, 0L, 2L, 2L, 0L, 0L, 2L, 0L, 0L, 0L, 0L, 0L, 
2L, 0L, 0L, 2L, 2L, 0L, 774L, 0L, 2L, 0L, 0L, 0L, 0L, 0L, 0L, 
2L, 0L, 2L, 774L, 0L, 2L, 0L, 0L, 2L, 2L, 2L, 2L, 774L, 0L, 0L, 
774L)), row.names = c(NA, -175L), class = c("tbl_df", "tbl", 
"data.frame"))

我创建了一个函数，该函数应返回与 nrow(df_headache_tibble) 长度相同的字符向量 (Q10_incidence)，该函数基于应用于数据帧的嵌套条件，按行排列。 Q10_incidence[i] 应该是将函数应用到我打算使用 mapply 的 df_headache_tibble[i,1] 和 df_headache_tibble[i,2] 的结果。

incidence_headaches<-function(x,y){
        if (is.na(x)|is.na(y)){
                        output<-NA
                }
        else if (x==2){
                if (y==2){
                        output<-'previous_headache_maintained'
                }else if(y==0){
                        output<-'previous_headache_ceased'
                }
        }else if(x %in% c(0,774,775,776)){
                if (y==2){
                        output<-'new_onset_headache'
                }else if (y %in% c(0, 774, 775, 776)){
                        output<-'no_headache'
                }
        }
}

Q10_incidence<-mapply(incidence_headaches, Q10_headache_tibble[,1], Q10_headache_tibble[,2])

当我打电话时

mapply(incidence_headaches, Q10_headache_tibble[,1], Q10_headache_tibble[,2])

在几个警告中，我得到了可怕的“条件长度 > 1，并且只会使用第一个元素”。我怎么能处理这个？虽然我发现了几个关于相同“条件有长度（...）”警告的问题，但我仍然觉得这个话题很混乱。欢迎使用“傻瓜式”演练。

似乎和向量化有关，可以通过用嵌套的 ifelse() 结构替换函数来解决，这可能会很乱。

我可能需要在很多场合使用类似的功能，但不确定什么是最好的解决方法。

【问题讨论】：

如果函数的任何输入 x 或 y 是长度 > 1 的向量，则 is.na(x) 或 is.na(y) 中的任何一个都是相同长度的向量，因此会出现警告.解决方案是使用ifelse。你能从这里走吗？

标签： r if-statement mapply

【解决方案1】：

1) 就我个人而言，我尝试尽可能多地使用 R 的众多命令中的一小部分。也许一个简单的apply 是一种更简单的管理方法。 apply 和 MARGIN = 1 会将每一行 pf 你的 data.frame 提供给一个函数。所以我对你的函数做了一些小改动（这里只有前 3 行感兴趣，其余的是复制和粘贴）：

incidence_headaches<-function(row){
  x <- row[1]
  y <- row[2]
  if (is.na(x)|is.na(y)){
    output<-NA
  }
  else if (x==2){
    if (y==2){
      output<-'previous_headache_maintained'
    }else if(y==0){
      output<-'previous_headache_ceased'
    }
  }else if(x %in% c(0,774,775,776)){
    if (y==2){
      output<-'new_onset_headache'
    }else if (y %in% c(0, 774, 775, 776)){
      output<-'no_headache'
    }
  }
}

然后您可以像这样使用简单的apply：

apply(df_headache_tibble, MARGIN = 1, incidence_headaches)

要得到这样的东西：

> apply(df_headache_tibble, MARGIN = 1, incidence_headaches)
  [1] "no_headache"                  "previous_headache_ceased"     "previous_headache_maintained"
  [4] "previous_headache_maintained" "new_onset_headache"           "no_headache"                 
  [7] "no_headache"                  "no_headache"                  "previous_headache_ceased"    
 [10] "new_onset_headache"           "previous_headache_ceased"     "previous_headache_maintained"
 [13] "no_headache"                  "previous_headache_ceased"     "no_headache" 
...

2) mapply 显然是一个完美的功能，没有理由不使用它。您的问题是：小标题是 data.frames，但它们的行为不像 data.frames。这很好用：

mapply(incidence_headaches, 
       as.data.frame(df_headache_tibble)[,1],
       as.data.frame(df_headache_tibble)[,2])

当您从 data.frame 中仅子集一行时，它会给您一个向量，当您从 tibble 中仅子集一行时，它会给您一个 tibble。与发明 R data.frame 的人相比，Hadley 对事情应该如何工作有不同的看法。有一些方法可以解决这个问题

mapply(incidence_headaches, 
       df_headache_tibble[,1, drop = TRUE],
       df_headache_tibble[,2, drop = TRUE])

在此处阅读详细信息，但请始终注意，尽管 tibbles 是 data.frames，但它们的行为与 data.frames 不同：https://tibble.tidyverse.org/reference/subsetting.html

【讨论】：

Bernhard，虽然你得到了想要的输出，但你没有得到同样的警告吗？
我没有收到警告。我的 1) 既不是使用带有 apply 的改编函数，也不是我的 2) 使用 mapply 就像你想要的那样，但没有使用适当的 data.frames小玩意儿。
正如@Bernhard 所写，这完全是关于 tibbles 中单行的奇怪处理，解决方案很简单。不太清楚我为什么要使用 tibbles。

【解决方案2】：

这是一个完全矢量化的解决方案，不需要*apply 循环。

incidence_headaches <- function(x, y){
  # create the return vector
  output <- rep('no_headache', NROW(x))
  # conditions for 'x'
  x_2 <- x == 2
  x_vec <- x %in% c(0, 774, 775, 776)
  # conditions for 'y'
  y_2 <- y == 2
  y_vec <- y %in% c(0, 774, 775, 776)
  # assign the return values given a combination
  # of the conditions above. Note that the
  # condition y == 0 is only used once and
  # therefore a logical vector is not needed
  output[is.na(x) | is.na(y)] <- NA_character_
  output[x_2 & y_2] <- 'previous_headache_maintained'
  output[x_2 & y == 0] <- 'previous_headache_ceased'
  output[x_vec & y_2] <- 'new_onset_headache'
  output[x_vec & y_vec] <- 'no_headache'
  # return to caller
  output
}

Q10_incidence <- incidence_headaches(Q10_headache_tibble[, 1], Q10_headache_tibble[, 2])
head(Q10_incidence)
#[1] "no_headache"                  "previous_headache_ceased"    
#[3] "previous_headache_maintained" "previous_headache_maintained"
#[5] "no_headache"                  "no_headache"

【讨论】：