R使用管道运算符时的条件评估％>％答案

【问题标题】：R Conditional evaluation when using the pipe operator %>%R使用管道运算符时的条件评估％>％
【发布时间】：2015-08-16 17:15:54
【问题描述】：

当使用管道运算符%>% 和dplyr、ggvis、dycharts 等包时，如何有条件地执行一步？例如；

step_1 %>%
step_2 %>%

if(condition)
step_3

这些方法似乎不起作用：

step_1 %>%
step_2 
if(condition) %>% step_3

step_1 %>%
step_2 %>%
if(condition) step_3

还有很长的路要走：

if(condition)
{
step_1 %>%
step_2 
}else{
step_1 %>%
step_2 %>%
step_3
}

有没有更好的方法没有所有的冗余？

【问题讨论】：

一个可以使用的示例（如 Ben 提供的）会更好，仅供参考。

标签： r dplyr ggvis magrittr

【解决方案1】：

对我来说，从管道上稍微退后一点似乎最容易（尽管我有兴趣看到其他解决方案），例如：

library("dplyr")
z <- data.frame(a=1:2)
z %>% mutate(b=a^2) -> z2
if (z2$b[1]>1) {
    z2 %>% mutate(b=b^2) -> z2
}
z2 %>% mutate(b=b^2) -> z3

这是对@JohnPaul 答案的轻微修改（您可能不会真的想要ifelse，它会评估它的两个论点并且是矢量化的）。修改它以返回会很好 . 如果条件为假，则自动... （警告：我认为这可行，但尚未真正测试/考虑过关于它太多了......）

iff <- function(cond,x,y) {
    if(cond) return(x) else return(y)
}

z %>% mutate(b=a^2) %>%
    iff(cond=z2$b[1]>1,mutate(.,b=b^2),.) %>%
 mutate(b=b^2) -> z4

【讨论】：

只想指出，当y 不是. 时，iff() 会返回错误。

【解决方案2】：

下面是一个利用. 和ifelse 的简单示例：

X<-1
Y<-T

X %>% add(1) %>% { ifelse(Y ,add(.,1), . ) }

在ifelse中，如果Y是TRUE，则加1，否则只返回X的最后一个值。 . 是一个替身，它告诉函数链的上一步的输出到哪里，所以我可以在两个分支上使用它。

编辑正如@BenBolker 指出的那样，您可能不想要ifelse，所以这里有一个if 版本。

X %>% 
add(1) %>% 
 {if(Y) add(.,1) else .}

感谢@Frank 指出我应该在我的if 和ifelse 语句周围使用{ 大括号来继续这个链条。

【讨论】：

我喜欢后期编辑版本。 ifelse 似乎不适合控制流。
需要注意的一点：如果链中有后面的步骤，请使用{}。例如，如果这里没有它们，就会发生不好的事情（出于某种原因只打印Y）：X %>% "+"(1) %>% {if(Y) "+"(1) else .} %>% "*"(5)
使用 magrittr 别名 add 会使示例更清晰。
用打高尔夫球的术语来说，这个具体的例子可以写成X %>% add(1*Y)，但是这当然不能回答原来的问题
{} 之间的条件块中的一件重要事情是，您必须使用点 (.) 引用 dplyr 管道（也称为 LHS）的前面参数 - 否则条件块不会收到这。论据！

【解决方案3】：

这是@JohnPaul 提供的答案的变体。此变体使用`if` 函数而不是复合if ... else ... 语句。

library(magrittr)

X <- 1
Y <- TRUE

X %>% `if`(Y, . + 1, .) %>% multiply_by(2)
# [1] 4

请注意，在这种情况下，`if` 函数和ifelse 函数周围都不需要大括号——只需要在if ... else ... 语句周围。但是，如果点占位符仅出现在嵌套函数调用中，则默认情况下 magrittr 会将左侧通过管道传递到右侧的第一个参数中。通过将表达式括在花括号中来覆盖此行为。注意这两个链的区别：

X %>% `if`(Y, . + 1, . + 2)
# [1] TRUE
X %>% {`if`(Y, . + 1, . + 2)}
# [1] 4

点占位符嵌套在函数调用中，因为. + 1 和. + 2 分别被解释为`+`(., 1) 和`+`(., 2)。因此，第一个表达式返回 `if`(1, TRUE, 1 + 1, 1 + 2) 的结果，（奇怪的是，`if` 不会抱怨额外未使用的参数），第二个表达式返回 `if`(TRUE, 1 + 1, 1 + 2) 的结果，这是在这种情况。

有关 magrittr 管道运算符如何处理点占位符的详细信息，请参阅 help file 中的 %>%，尤其是“将点用于次要目的”部分。

【讨论】：

使用`ìf`和ifelse有什么区别？它们的行为是否相同？
@AgileBean if 和 ifelse 函数的行为不相同。 ifelse 函数是一个矢量化的if。如果您为if 函数提供逻辑向量，它将打印警告并且仅使用该逻辑向量的第一个元素。比较 `if`(c(T, F), 1:2, 3:4) 和 ifelse(c(T, F), 1:2, 3:4)。
太好了，感谢您的澄清！因此，由于上述问题是非矢量化的，您也可以将解决方案写为X %>% { ifelse(Y, .+1, .+2) }

【解决方案4】：

我认为purrr::when() 就是这种情况。让我们总结几个数字，如果它们的和小于 25，则返回 0。

library("magrittr")
1:3 %>% 
  purrr::when(sum(.) < 25 ~ sum(.), 
              ~0
  )
#> [1] 6

when 返回第一个有效条件的操作所产生的值。将条件放在~ 的左侧，将操作放在它的右侧。上面，我们只使用了一个条件（然后是一个 else 情况），但是你可以有很多条件。

您可以轻松地将其集成到更长的管道中。

【讨论】：

不错！这也为“切换”提供了更直观的替代方案。

【解决方案5】：

我喜欢purrr::when，这里提供的其他基本解决方案都很棒，但我想要更紧凑和灵活的东西，所以我设计了函数pif（管道如果），请参阅答案末尾的代码和文档。

参数可以是函数表达式（支持公式表示法），如果条件为FALSE，则默认返回输入不变。

用于其他答案的示例：

## from Ben Bolker
data.frame(a=1:2) %>% 
  mutate(b=a^2) %>%
  pif(~b[1]>1, ~mutate(.,b=b^2)) %>%
  mutate(b=b^2)
#   a  b
# 1 1  1
# 2 2 16

## from Lorenz Walthert
1:3 %>% pif(sum(.) < 25,sum,0)
# [1] 6

## from clbieganek 
1 %>% pif(TRUE,~. + 1) %>% `*`(2)
# [1] 4

# from theforestecologist
1 %>% `+`(1) %>% pif(TRUE ,~ .+1)
# [1] 3

其他例子：

## using functions
iris %>% pif(is.data.frame, dim, nrow)
# [1] 150   5

## using formulas
iris %>% pif(~is.numeric(Species), 
             ~"numeric :)",
             ~paste(class(Species)[1],":("))
# [1] "factor :("

## using expressions
iris %>% pif(nrow(.) > 2, head(.,2))
#   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
# 1          5.1         3.5          1.4         0.2  setosa
# 2          4.9         3.0          1.4         0.2  setosa

## careful with expressions
iris %>% pif(TRUE, dim,  warning("this will be evaluated"))
# [1] 150   5
# Warning message:
# In inherits(false, "formula") : this will be evaluated
iris %>% pif(TRUE, dim, ~warning("this won't be evaluated"))
# [1] 150   5

功能

#' Pipe friendly conditional operation
#'
#' Apply a transformation on the data only if a condition is met, 
#' by default if condition is not met the input is returned unchanged.
#' 
#' The use of formula or functions is recommended over the use of expressions
#' for the following reasons :
#' 
#' \itemize{
#'   \item If \code{true} and/or \code{false} are provided as expressions they 
#'   will be evaluated wether the condition is \code{TRUE} or \code{FALSE}.
#'   Functions or formulas on the other hand will be applied on the data only if
#'   the relevant condition is met
#'   \item Formulas support calling directly a column of the data by its name 
#'   without \code{x$foo} notation.
#'   \item Dot notation will work in expressions only if `pif` is used in a pipe
#'   chain
#' }
#' 
#' @param x An object
#' @param p A predicate function, a formula describing such a predicate function, or an expression.
#' @param true,false Functions to apply to the data, formulas describing such functions, or expressions.
#'
#' @return The output of \code{true} or \code{false}, either as expressions or applied on data as functions
#' @export
#'
#' @examples
#'# using functions
#'pif(iris, is.data.frame, dim, nrow)
#'# using formulas
#'pif(iris, ~is.numeric(Species), ~"numeric :)",~paste(class(Species)[1],":("))
#'# using expressions
#'pif(iris, nrow(iris) > 2, head(iris,2))
#'# careful with expressions
#'pif(iris, TRUE, dim,  warning("this will be evaluated"))
#'pif(iris, TRUE, dim, ~warning("this won't be evaluated"))
pif <- function(x, p, true, false = identity){
  if(!requireNamespace("purrr")) 
    stop("Package 'purrr' needs to be installed to use function 'pif'")

  if(inherits(p,     "formula"))
    p     <- purrr::as_mapper(
      if(!is.list(x)) p else update(p,~with(...,.)))
  if(inherits(true,  "formula"))
    true  <- purrr::as_mapper(
      if(!is.list(x)) true else update(true,~with(...,.)))
  if(inherits(false, "formula"))
    false <- purrr::as_mapper(
      if(!is.list(x)) false else update(false,~with(...,.)))

  if ( (is.function(p) && p(x)) || (!is.function(p) && p)){
    if(is.function(true)) true(x) else true
  }  else {
    if(is.function(false)) false(x) else false
  }
}

【讨论】：

“另一方面，只有满足相关条件时，才会对数据应用函数或公式。”你能解释一下你为什么决定这样做吗？
所以我只计算我需要计算的东西，但我想知道为什么我没有用表达式来做。出于某种原因，我似乎不想使用非标准评估。我想我的自定义函数中有一个修改版本，有机会我会更新。