【问题标题】:Logical function across multiple columns using "any" function使用“any”函数跨多个列的逻辑函数
【发布时间】:2021-07-30 12:22:05
【问题描述】:

我想跨多列运行逻辑操作(多个条件)。我写了一个运行良好的查询。但是,我想缩短我的代码,因为我必须编写几个查询。

我尝试使用“any”和“brackets”来缩短查询。但是,第二个查询运行良好,但给了我不同的答案。 “任何”功能是否适用于多列?

这是我的条件-

  1. 任何一列(B2 到 B5)都有 1 & B1
  2. 任何一列(B2 到 B5)都有 -99 和 B1
  3. B1 ==3,然后是“Noissue”
  4. 休息就是问题
Participate B1 B2 B3 B4 B5 Query1 Query2
3 -1 -1 -1 -1 -1 Noissue Noissue
1 -1 1 -1 -1 1 Noissue Noissue
1 -1 -1 -1 -1 -1 Issue Noissue
2 -1 1 1 -1 1 Noissue Noissue
2 1 1 1 1 -1 Noissue Noissue
1 -99 -99 -99 -99 -99 Noissue Noissue

如果有人帮助我减少使用不同功能的代码行,我将不胜感激。

 mutate(Batch_v1, 
               case_when (
                 ((Batch_v1$B1 == 1 |  Batch_v1$B2 == 1 | Batch_v1$B3 == 1 | Batch_v1$B4 == 1 | Batch_v1$B5 == 1| Batch_v1$B6 == 1| Batch_v1$B7 == 1|Batch_v1$B8 == 1|Batch_v1$B9 == 1|Batch_v1$B10 == 1|Batch_v1$BOth == 1) & 
                    Batch_v1$Participate %in% c(1,2,-99))~"Noissue",
                 ((Batch_v1$B1 == -99 |  Batch_v1$B2 == -99 | Batch_v1$B3 == -99|Batch_v1$B4 == -99 |Batch_v1$B5 == -99|Batch_v1$B6 == -99|Batch_v1$B7 == -99|Batch_v1$B8 == 1|Batch_v1$B9 == -99|Batch_v1$B10 == -99|Batch_v1$BOth == -99) & 
                    Batch_v1$Participate %in% c(1,2,-99))~"Noissue",
                 Batch_v1$Participate ==3 ~ "Noissue",
                 TRUE ~ "Issue"))





mutate(Batch_v1, 
   case_when (
     ((any(Batch_v1[,2:6] == 1)) & Batch_v1$Participate %in% c(1,2,-99))~ "Noissue",
     ((any(Batch_v1[,2:6] == -99)) & Batch_v1$Participate %in% c(1,2,-99))~ "Noissue",
     Batch_v1$Participate ==3 ~ "Noissue",
     TRUE ~ "Issue"))

【问题讨论】:

  • 您的示例中的query1query2 是什么?为什么第三行和第一行都有不同的query1 值,尽管它们都是由-1 制作的?
  • any(Batch_v1[,2:6] 可能不会像您认为的那样做。它将对列中的所有值进行操作,而不是您可能想要的逐行操作。
  • 在描述中你说如果 B1==3 而不是“Noissue”,但在你的解决方案中它是 if Participate==3。请澄清

标签: r dplyr


【解决方案1】:

我们可以使用acrosscase_when

library(dplyr)
df %>% 
    mutate(across(B2:B5, ~case_when(. == 1 & B1 <=2 ~ "Noissue",
                                    . == -99 & B1 <=2 ~ "Noissue",
                                    B1 == 3 ~ "Noissue",
                                    TRUE ~ "issue")
                  )
           )

输出:

  Participate    B1 B2      B3      B4      B5      Query1  Query2 
        <dbl> <dbl> <chr>   <chr>   <chr>   <chr>   <chr>   <chr>  
1           3    -1 issue   issue   issue   issue   Noissue Noissue
2           1    -1 Noissue issue   issue   Noissue Noissue Noissue
3           1    -1 issue   issue   issue   issue   Issue   Noissue
4           2    -1 Noissue Noissue issue   Noissue Noissue Noissue
5           2     1 Noissue Noissue Noissue issue   Noissue Noissue
6           1   -99 Noissue Noissue Noissue Noissue Noissue Noissue

数据:

df <- structure(list(Participate = c(3, 1, 1, 2, 2, 1), B1 = c(-1, 
-1, -1, -1, 1, -99), B2 = c(-1, 1, -1, 1, 1, -99), B3 = c(-1, 
-1, -1, 1, 1, -99), B4 = c(-1, -1, -1, -1, 1, -99), B5 = c(-1, 
1, -1, 1, -1, -99), Query1 = c("Noissue", "Noissue", "Issue", 
"Noissue", "Noissue", "Noissue"), Query2 = c("Noissue", "Noissue", 
"Noissue", "Noissue", "Noissue", "Noissue")), problems = structure(list(
row = 6L, col = "Query2", expected = "", actual = "embedded null", 
file = "'test'"), row.names = c(NA, -1L), class = c("tbl_df", 
"tbl", "data.frame")), class = c("spec_tbl_df", "tbl_df", "tbl", 
"data.frame"), row.names = c(NA, -6L))

【讨论】:

  • 以下是我遇到的错误 - eval(substitute(expr), envir, enclos) 中的错误:找不到“跨”函数。我正在尝试根据链接 [link]stackoverflow.com/questions/61615137/… 安装开发版本(tidyverse)
【解决方案2】:

当我们必须在多列中逐行使用逻辑条件时,通常应该考虑两种主要方法。这些消除了对rowwise()Reduce() 的需要,可以使用lapply/map %&gt;% Reduce/reduce 或复杂的case_when()statements。

-1) rowSums(condition)
-2)if_any() / if_all()

这个问题最适合if_any()的解决方案。

if_any()

Batch_v1 %>% mutate(query3 = ifelse(if_any(B2:B5, ~.x %in% c(-99, 1)) & B1<=2,
                                    "Noissue", "Issue"))

rowSums()

Batch_v1 %>% mutate(query3 = ifelse(rowSums(across(B2:B5, ~.x %in% c(-99, 1)))>0 & B1<=2,
                                    "Noissue", "Issue"))

输出

# A tibble: 6 x 9
  Participate    B1    B2    B3    B4    B5 Query1  Query2  query3 
        <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>   <chr>   <chr>  
1           3    -1    -1    -1    -1    -1 Noissue Noissue Issue  
2           1    -1     1    -1    -1     1 Noissue Noissue Noissue
3           1    -1    -1    -1    -1    -1 Issue   Noissue Issue  
4           2    -1     1     1    -1     1 Noissue Noissue Noissue
5           2     1     1     1     1    -1 Noissue Noissue Noissue
6           1   -99   -99   -99   -99   -99 Noissue Noissue Noissue

这里有类似问题的一些很好的答案:
Rowwise logical operations with mutate() and filter() in R 在这里:
R - Remove rows from dataframe that contain only zeros in numeric columns, base R and pipe-friendly methods? 免责声明:我提出或回答了这些问题

【讨论】:

    【解决方案3】:

    你可以使用

    library(dplyr)
    
    Batch_v1 %>% 
      rowwise() %>%
      mutate(
        Query3 = case_when(
          any(B1:B5 == 1)   & Participate %in% c(1,2,-99) ~ "Noissue",
          any(B1:B5 == -99) & Participate %in% c(1,2,-99) ~ "Noissue",
          Participate == 3                                ~ "Noissue",
          TRUE                                            ~ "Issue"
          )
        )
    

    返回

    # A tibble: 6 x 9
    # Rowwise: 
      Participate    B1    B2    B3    B4    B5 Query1  Query2  Query3 
            <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>   <chr>   <chr>  
    1           3    -1    -1    -1    -1    -1 Noissue Noissue Noissue
    2           1    -1     1    -1    -1     1 Noissue Noissue Noissue
    3           1    -1    -1    -1    -1    -1 Issue   Noissue Issue  
    4           2    -1     1     1    -1     1 Noissue Noissue Noissue
    5           2     1     1     1     1    -1 Noissue Noissue Noissue
    6           1   -99   -99   -99   -99   -99 Noissue Noissue Noissue
    

    第二个代码的主要问题是函数

    any(Batch_v1[,2:6] == 1)
    

    我们来看看

    Batch_v1[,2:6] == 1
    
    #>         B1    B2    B3    B4    B5
    #> [1,] FALSE FALSE FALSE FALSE FALSE
    #> [2,] FALSE  TRUE FALSE FALSE  TRUE
    #> [3,] FALSE FALSE FALSE FALSE FALSE
    #> [4,] FALSE  TRUE  TRUE FALSE  TRUE
    #> [5,]  TRUE  TRUE  TRUE  TRUE FALSE
    #> [6,] FALSE FALSE FALSE FALSE FALSE
    

    所以Batch_v1[,2:6] == 1 返回一个布尔值的data.frame。如果此 data.frame 内的值的 anyTRUE,则在此 data.frame 上应用 any 将返回 TRUE。这显然不是你想要的行为。 使用rowwise() 会强制应用any...嗯...每行。

    注意:tidyverse-pipe 中,如果您要引用正在使用的当前对象,则不希望使用Batch_v1$B1Batch_v1$B1 例如指的是原始 Batch_v1,没有进行任何转换。在这种情况下,没有真正的区别,但一般情况下您不应该依赖它。

    【讨论】:

    • 感谢您的代码。但是,我这边有问题。代码给出一个错误 - 找不到对象“B2”。我无法运行@TarJae 给出的其他代码 - “eval 中的错误(substitute(expr), envir, enclos) :找不到函数“across”。
    • 更新你的 tidyverse-package 应该可以解决这个问题。您的列是否命名为 B1 - B5?
    • 是的,列的名称相同。我有 Rtools 的问题。我已经重新安装了所有东西,代码工作正常。谢谢
    猜你喜欢
    • 2018-11-22
    • 1970-01-01
    • 1970-01-01
    • 2021-10-25
    • 1970-01-01
    • 1970-01-01
    • 2021-06-15
    • 1970-01-01
    • 2012-04-23
    相关资源
    最近更新 更多