R - 使用 purrr::pmap() 进行逐行迭代答案

【问题标题】：R - Using purrr::pmap() for row-wise iterationR - 使用 purrr::pmap() 进行逐行迭代
【发布时间】：2019-03-02 15:06:46
【问题描述】：

我正在尝试了解 pmap 的工作原理。下面的小标题包含一个列表列values。我想创建一个新列New，这取决于values 列中的相应元素是否为NULL。由于 is.null 没有矢量化，我最初想在遇到pmap() 之前使用rowwise()。

在mutate() 之前使用rowwise() 可以得到所需的结果，如下所示：

tbl = as.data.frame(do.call(rbind, pars)) %>%
  rowwise() %>%
  mutate(New = ifelse(is.null(values), paste(id, default), paste(id, values, collapse=", ")))

> tbl
Source: local data frame [2 x 6]
Groups: <by row>

# A tibble: 2 x 6
  id        lower     upper     values     default   New        
  <list>    <list>    <list>    <list>     <list>    <chr>        
1 <chr [1]> <dbl [1]> <dbl [1]> <NULL>     <dbl [1]> a 5          
2 <chr [1]> <NULL>    <NULL>    <list [3]> <chr [1]> b 0, b 1, b 2

但是，pmap() 没有：

tbl = as.data.frame(do.call(rbind, pars)) %>%
      mutate(New = pmap(., ~ifelse(is.null(values), paste(id, default), paste(id, values, collapse=", "))))

> tbl
  id lower upper  values default                         New
1  a     1    10    NULL       5 a NULL, b list("0", "1", "2")
2  b  NULL  NULL 0, 1, 2       1 a NULL, b list("0", "1", "2")

如果我使用匿名函数代替波浪号，它似乎可以工作：

tbl = as.data.frame(do.call(rbind, pars)) %>%
  mutate(Value = pmap(., function(values, default, id, ...) ifelse(is.null(values), paste(id, default), paste(id, values, collapse=", "))))

> tbl
  id lower upper  values default         Value
1  a     1    10    NULL       5           a 5
2  b  NULL  NULL 0, 1, 2       1 b 0, b 1, b 2

但我不明白为什么波浪号版本会失败？我宁愿不必完整指定参数，因为我需要将函数映射到多个列。我哪里错了？

【问题讨论】：

您能否为上述内容添加一个可重现的示例，以便于提供帮助？我们没有 pars 开始的对象。

标签： r vectorization data-manipulation purrr pmap

【解决方案1】：

我正要问一个与此非常相似的问题。基本上，询问如何在mutate 中使用pmap，而不必多次使用变量名。相反，我会将其作为“答案”发布在这里，因为它包括一个代表和一些我发现的选项，其中没有一个让我完全满意。希望其他人能够根据需要回答如何操作。

在使用带有列表列的 data.frame 时，我经常想在 dplyr::mutate 中使用 purrr::pmap。有时这涉及到变量名的大量重复。我希望能够更简洁地做到这一点，使用匿名函数，以便变量在传递给pmap 的.f 参数时只使用一次。

以这个小数据集为例：

library('dplyr')
library('purrr')

df <- tribble(
  ~x,   ~y,      ~z,         
  c(1), c(1,10), c(1, 10, 100),
  c(2), c(2,20), c(2, 20, 200),
)

说我要应用到每一行的函数是

func <- function(x, y, z){c(sum(x), sum(y), sum(z))}

在实践中，函数会更复杂，有很多变量。该函数只需要一次，所以我不想明确命名它并阻塞我的脚本和我的工作环境。

这里是选项。每个都创建完全相同的 data.frame 但以不同的方式。包含avg`` will be come clear. Note I'm not considering position matching using..1,..2`等的原因，因为这很容易搞砸。

# Explicitly create a function for `.f`.
# This requires using the variable names (x, y, z) three times.
# It's completely clear what it's doing, but needs a lot of typing.
# It might sometimes fail - see https://github.com/tidyverse/purrr/issues/280

df_explicit <- df %>%
  mutate(
    avg = x - mean(x),
    a = pmap(.l = list(x, y, z), .f = function(x, y, z){ c(sum(x), sum(y), sum(z)) })
  )

# Pass the whole of `df` to `.l` and add `...` in an explicit function to deal with any unused columns. 
# variable names are used twice.
# `df` will have to be passes explicitly if not using pipes (eg, `mutate(.data = df, a = pmap(.l = df, ...`).
# This is probably inefficient for large datasets.

df_dots <- df %>%
  mutate(
    avg = x - mean(x),
    a = pmap(.l = ., .f = function(x, y, z, ...){ c(sum(x), sum(y), sum(z)) })
  )

# Use `pryr::f` (as discussed in https://stackoverflow.com/a/51123520/4269699).
# Variable names are used twice.
# Potentially unexpected behaviour.
# Not obvious to the casual reader why the extra `pryr::f` is needed and what it's doing

df_pryrf <- df %>%
  mutate(
    avg = x - mean(x),
    a = pmap(.l = list(x,y,z), .f = pryr::f({c(sum(x), sum(y), sum(z))} ))
  )

# Use `rowwise()` similar to this: https://stackoverflow.com/a/47734073/4269699
# Variable names are used once.
# It will mess up any vectorised functions used elsewhere in mutate, hence the two `mutate()`s

df_rowwise <- df %>%
  mutate( avg = x - mean(x) ) %>%
  rowwise() %>%
  mutate( a = list( {c(sum(x), sum(y), sum(z))} ) ) %>%
  ungroup()

# Use Romain Francois' neat {rap} package.
# Variable names used once.
# Like `rowwise()` it will mess up any vectorised functions so it needs two `mutate()`s for this particular problem
#

library('rap') #devtools::install_github("romainfrancois/rap")
df_rap <- df %>%
  mutate( avg = x - mean(x) ) %>%
  rap( a = ~ c(sum(x), sum(y), sum(z)) )

# Another solution discussed here https://stackoverflow.com/a/51123520/4269699 doesn't seem to work inside `mutate()`, but maybe could be tweaked?
# Like the `pryr::f` solution, it's not immediately obvious what the purpose of the `with(list(...` bit is.

df_with <- df %>%
  mutate(
    avg = x-mean(x),
    a = pmap(.l = list(x,y,z), .f = ~with(list(...), { c(sum(x), sum(y), sum(z))} ))
  )

据我所知，这些是选项，不包括位置匹配。

理想情况下，可能会出现以下情况，其中函数qmap 知道从传递给mutates .data 的对象中查找（按行）变量x、y 和z论据。

df_new <- df %>%
  mutate(
    avg = x-mean(x),
    a = qmap( ~c(sum(x), sum(y), sum(z)) )
  )

但我不知道该怎么做，所以这只是部分答案。