【问题标题】:Method dispatch for functions inside dplyr::dodplyr::do 内部函数的方法调度
【发布时间】:2019-01-07 23:18:28
【问题描述】:

如何为dplyr::do 内部的函数实现方法分派?

我已经阅读了 GitHub 问题 #719#3558#3429,它们提供了有关如何为 dplyr 动词创建方法的有用信息,但没有什么特别适用于 dplyr::do - 这是排序“特殊”的意义在于调度不仅需要发生在dplyr:do 本身,还需要发生在dplyr::do 内部调用的函数(或者至少这是我所追求的)

这是我尝试过的:

预赛

library(dplyr)
#> 
#> Attache Paket: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

# Example data ------------------------------------------------------------

df <- tibble::tibble(
  id = c(rep("A", 5), rep("B", 5)),
  x = 1:10
)

df_custom <- df
class(df_custom) <- c("tbl_df_custom", class(df_custom))

# Reclass function --------------------------------------------------------

reclass <- function(x, result) {
  UseMethod('reclass')
}

reclass.default <- function(x, result) {
  class(result) <- unique(c(class(x)[[1]], class(result)))
  attr(result, class(x)[[1]]) <- attr(x, class(x)[[1]])
  result
}

第 1 步:尝试为 dplyr 动词定义方法

# Custom method for summarize ---------------------------------------------

summarise.tbl_df_custom <- function (.data, ...) {
  message("Custom method for `summarise`")
  result <- NextMethod("summarise")
  ret <- reclass(.data, result)
  print(class(ret))
  ret
}

ret <- df_custom %>%
  summarise(y = mean(x))
#> Custom method for `summarise`
#> [1] "tbl_df_custom" "tbl_df"        "tbl"           "data.frame"
ret %>% class()
#> [1] "tbl_df_custom" "tbl_df"        "tbl"           "data.frame"

第 2 步:尝试为另一个 dplyr 动词定义一个方法来测试更长的管道

# Custom method for group_by ----------------------------------------------

group_by.tbl_df_custom <- function (.data, ..., add = FALSE) {
  message("Custom method for `group_by`")
  result <- NextMethod("group_by")
  ret <- reclass(.data, result)
  print(class(ret))
  ret
}

ret <- df_custom %>%
  group_by(id) %>%
  summarise(y = mean(x))
#> Custom method for `group_by`
#> [1] "tbl_df_custom" "grouped_df"    "tbl_df"        "tbl"          
#> [5] "data.frame"
#> Custom method for `summarise`
#> [1] "tbl_df_custom" "tbl_df"        "tbl"           "data.frame"
ret %>% class()
#> [1] "tbl_df_custom" "tbl_df"        "tbl"           "data.frame"

第 3 步:对 do 尝试相同的操作

# Custom method for do ----------------------------------------------------

do.tbl_df_custom <- function (.data, ...) {
  message("custom method for `do`")
  result <- NextMethod("do")
  ret <- reclass(.data, result)
  print(class(ret))
  ret
}

foo <- function(df) {
  UseMethod("foo")
}

foo.default <- function(df) {
  message("Default method for `foo`")
  df %>%
    summarise(y = mean(x))
}

foo.tbl_df_custom <- function(df) {
  message("Custom method for `foo`")
  df %>%
    summarise(y = mean(x) * 100)
}

ret <- df_custom %>%
  group_by(id) %>%
  do(foo(.))
#> Custom method for `group_by`
#> [1] "tbl_df_custom" "grouped_df"    "tbl_df"        "tbl"          
#> [5] "data.frame"
#> custom method for `do`
#> Default method for `foo`
#> Default method for `foo`
#> [1] "tbl_df_custom" "grouped_df"    "tbl_df"        "tbl"          
#> [5] "data.frame"
ret
#> # A tibble: 2 x 2
#> # Groups:   id [2]
#>   id        y
#>   <chr> <dbl>
#> 1 A         3
#> 2 B         8
ret %>% class()
#> [1] "tbl_df_custom" "grouped_df"    "tbl_df"        "tbl"          
#> [5] "data.frame"

虽然这看起来不错,但问题是调用了 default 而不是 foocustom 方法。

reprex package (v0.2.1) 于 2019 年 1 月 8 日创建

【问题讨论】:

  • 令人着迷的问题,我无法找到问题所在,但这里有一些可能有用的信息。自定义 do 函数适用于未分组的数据帧:df_custom %&gt;% do(foo(.))。只有当您group_by 时才会出现问题。 dplyr 内部对分组数据帧的处理方式不同。请注意代码如何两次打印“foo 的默认方法”。这是因为do.grouped_df 函数内部有一个双for 循环,它按组获取数据帧的切片。当这些切片被取走时,你定义的特殊类就丢失了。查看class(df_custom[1,])
  • group_by 调用 ungroup,另一个泛型,剥离自定义类。也许需要一个ungroup.tbl_df_custom 方法。

标签: r methods dplyr r-s3


【解决方案1】:

所以问题与this question I just asked 有关。我可以通过定义 3 个新函数来解决这个问题:ungroup.tbl_df_custom,一个类构造函数和[.tbl_df_custom

ungroup.tbl_df_custom <- function (.data, ...) {
  message("custom method for `ungroup`")
  result <- NextMethod("ungroup")
  ret <- reclass(.data, result)
  ret
}


new_custom <- function(x, ...) {

  structure(x, class = c("tbl_df_custom", class(x)))
}

`[.tbl_df_custom` <- function(x, ...) {
  new_custom(NextMethod())
}



df_custom2 <- new_custom(df)


df_custom2 %>%
  group_by(id) %>%
  do(foo(.))

Custom method for `group_by`
[1] "tbl_df_custom" "grouped_df"    "tbl_df"        "tbl"           "data.frame"   
custom method for `do`
custom method for `ungroup`
Custom method for `foo`
Custom method for `summarise`
[1] "tbl_df_custom" "tbl_df"        "tbl"           "data.frame"   
Custom method for `foo`
Custom method for `summarise`
[1] "tbl_df_custom" "tbl_df"        "tbl"           "data.frame"   
[1] "tbl_df_custom" "grouped_df"    "tbl_df"        "tbl"           "data.frame"   
custom method for `ungroup`
# A tibble: 2 x 2
# Groups:   id [2]
  id        y
  <chr> <dbl>
1 A       300
2 B       800

【讨论】:

  • 谢谢,完全错过了ungroup 方面!我会接受你的回答。为了提供完整且独立的参考,我将发布一个自己的答案,其中包含我的特定示例从头到尾的所有代码。
  • 在阅读S3 inheritance in advanced R 章节时,我注意到我可以将reclass 函数替换为vctrs::vec_restore,它也有data.frames 的方法(请参阅library(vctrs); sloop::s3_methods_generic("vec_restore")。显然,这也使 [.tbl_df_custom 方法过时,从而简化了一些事情。
  • 我错了能够删除[.tbl_df_custom,这实际上是至关重要的。再次感谢您提出答案,今天学到了很多:-)
【解决方案2】:

为了获得一个完整且独立的示例,其中包含我的特定示例从头到尾的所有代码,我还将在此处发布自己的答案。

有几点需要强调:

  1. 除了group_by() 的自定义方法之外,我可以将reclass() 换成更好的vctrs::vec_restore(),它也恰好有一个data.frame 方法(请参阅library(vctrs); sloop::s3_methods_generic("vec_restore"))。

    您可以在章节S3 inheritance of Advanced R 以及S3 vectors article 上的https://vctrs.r-lib.org/ 中找到有关vctrs::vec_restore() 的更多信息

    如果在vctrs::vec_restore() 中有类似combine 的参数来考虑通过调用group_by() 的默认方法添加的grouped_df() 类属性,那就太好了,但这是另一回事(为此我提交了一个好奇的GitHub issue)。

    目前,由于vctrs::vec_restore() 的实现方式,我们的自定义类信息将被删除(请参阅下面的“测试结果”)。

  2. 我发现非常有用的 GitHub 问题:#3429 尤其是#3923

代码

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

# Constructor for tbl_df_custom class -------------------------------------

new_df_custom <- function(x = tibble()) {
  stopifnot(tibble::is_tibble(x))
  structure(x, class = c("tbl_df_custom", class(x)))
}

# Example data ------------------------------------------------------------

df_custom <- new_df_custom(
  x = tibble::tibble(
    id = c(rep("A", 3), rep("B", 3)),
    x = 1:6
  )
)

df_custom
#> # A tibble: 6 x 2
#>   id        x
#> * <chr> <int>
#> 1 A         1
#> 2 A         2
#> 3 A         3
#> 4 B         4
#> 5 B         5
#> 6 B         6
df_custom %>% class()
#> [1] "tbl_df_custom" "tbl_df"        "tbl"           "data.frame"

# Reclass function for preserving custom class attribute ------------------

reclass <- function(x, to) {
  UseMethod('reclass')
}

reclass.default <- function(x, to) {
  class(x) <- unique(c(class(to)[[1]], class(x)))
  attr(x, class(to)[[1]]) <- attr(to, class(to)[[1]])
  x
}

# Custom method for summarise ---------------------------------------------

summarise.tbl_df_custom <- function (.data, ...) {
  message("Custom method for `summarise`")
  vctrs::vec_restore(NextMethod(), .data)
}

# Custom method for group_by ----------------------------------------------

group_by.tbl_df_custom <- function (.data, ..., add = FALSE, 
  use_vec_restore = FALSE
) {
  message("Custom method for `group_by`")
  retval <- reclass(NextMethod(), .data)
  print(class(retval))
  retval
}

# Custom method for ungroup ----------------------------------------------

ungroup.tbl_df_custom <- function (.data, ...) {
  message("custom method for `ungroup`")
  vctrs::vec_restore(NextMethod(), .data)
}

# Custom method for do ----------------------------------------------------

do.tbl_df_custom <- function (.data, ...) {
  message("custom method for `do`")
  vctrs::vec_restore(NextMethod(), .data)
}

# Custom extraction method ------------------------------------------------

`[.tbl_df_custom` <- function(x, ...) {
  message("custom method for `[`")
  new_df_custom(NextMethod())
}

# Create custom methods for foo -------------------------------------------

foo <- function(df) {
  UseMethod("foo")
}

foo.default <- function(df) {
  message("Default method for `foo`")
  df %>%
    summarise(y = mean(x))
}

foo.tbl_df_custom <- function(df) {
  message("Custom method for `foo`")
  df %>%
    summarise(y = mean(x) * 100)
}

# Testing things out ------------------------------------------------------

retval <- df_custom %>%
  group_by(id) %>%
  do(foo(.))
#> Custom method for `group_by`
#> [1] "tbl_df_custom" "grouped_df"    "tbl_df"        "tbl"          
#> [5] "data.frame"
#> custom method for `do`
#> custom method for `ungroup`
#> custom method for `[`
#> Custom method for `foo`
#> Custom method for `summarise`
#> custom method for `[`
#> Custom method for `foo`
#> Custom method for `summarise`

retval
#> custom method for `[`
#> custom method for `ungroup`
#> # A tibble: 2 x 2
#> # Groups:   id [2]
#>   id        y
#>   <chr> <dbl>
#> 1 A       200
#> 2 B       500
retval %>% class()
#> [1] "tbl_df_custom" "grouped_df"    "tbl_df"        "tbl"          
#> [5] "data.frame"

reprex package (v0.2.1) 于 2019 年 1 月 8 日创建

替代reclass(): vctrs::vec_restore()

# Alternative version for group_by that uses vctrs::vec_restore -----------

group_by.tbl_df_custom <- function (.data, ..., add = FALSE) {
  message("Custom method for `group_by`")
  retval <- vctrs::vec_restore(NextMethod(), .data)
  print(class(retval))
  retval
}

retval <- df_custom %>%
  group_by(id) %>%
  do(foo(.))
#> Custom method for `group_by`
#> [1] "tbl_df_custom" "tbl_df"        "tbl"           "data.frame"
#> custom method for `do`
#> Custom method for `foo`
#> Custom method for `summarise`

retval
#> custom method for `[`
#> # A tibble: 1 x 1
#>       y
#>   <dbl>
#> 1   350
retval %>% class()
#> [1] "tbl_df_custom" "tbl_df"        "tbl"           "data.frame"

reprex package (v0.2.1) 于 2019 年 1 月 8 日创建

如上所述,请注意,当使用group_by() 的替代版本时,使用vctrs::vec_restore() 而不是reclass(),类属性grouped_df 将被删除。

替代reclass(): vec_restore_inclusive()

这是一个自己的实现,它试图利用vctrs::vec_restore() 的工作方式,同时在决定如何执行“重置”时考虑to 的属性。可以说,“组合”或“对齐”会更好地命名该函数的组件。

vec_restore_inclusive <- function(x, to) {
  UseMethod('vec_restore_inclusive')
}

vec_restore_inclusive.data.frame <- function (x, to) {
  attr_to <- attributes(to)
  attr_x <- attributes(x)
  attr_use <- if (
    length(classes_preserve <- setdiff(attr_to[["class"]], attr_x[["class"]]))
  ) {
    attr_x
  } else {
    attr_to
  }

  attr_use[["names"]] <- attr_x[["names"]]
  attr_use[["row.names"]] <- .set_row_names(vctrs:::df_length(x))
  attr_use[["class"]] <- unique(c(classes_preserve, attr_x[["class"]]))
  attributes(x) <- attr_use
  x
}

group_by.tbl_df_custom <- function (.data, ..., add = FALSE) {
  message("Custom method for `group_by`")
  retval <- vec_restore_inclusive(NextMethod(), .data)
  print(class(retval))
  retval
}

retval <- df_custom %>%
  group_by(id) %>%
  do(foo(.))
#> Custom method for `group_by`
#> [1] "tbl_df_custom" "grouped_df"    "tbl_df"        "tbl"          
#> [5] "data.frame"
#> custom method for `do`
#> custom method for `ungroup`
#> custom method for `[`
#> Custom method for `foo`
#> Custom method for `summarise`
#> custom method for `[`
#> Custom method for `foo`
#> Custom method for `summarise`

retval
#> custom method for `[`
#> custom method for `ungroup`
#> # A tibble: 2 x 2
#> # Groups:   id [2]
#>   id        y
#>   <chr> <dbl>
#> 1 A       200
#> 2 B       500
retval %>% class()
#> [1] "tbl_df_custom" "grouped_df"    "tbl_df"        "tbl"          
#> [5] "data.frame"

reprex package (v0.2.1) 于 2019 年 1 月 8 日创建

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2018-04-21
    • 1970-01-01
    • 1970-01-01
    • 2018-12-15
    • 2019-01-15
    • 1970-01-01
    相关资源
    最近更新 更多