警告消息：pairwise_count 函数答案

【问题标题】：Warning Message: pairwise_count Function警告消息：pairwise_count 函数
【发布时间】：2020-09-20 09:31:38
【问题描述】：

我正在尝试关注 this tutorial 使用 widyr 包中的 pairwise_count 函数。

特别是考虑这行代码，其中 data 是一个包含“word”和“section”列的 tibble：

data %>% pairwise_count(word, section, sort = TRUE)

但是，我收到了以下警告消息：

distinct_() 自 dplyr 0.7.0 起已弃用。请改用distinct()。
tbl_df() 自 dplyr 1.0.0 起已弃用。请改用tibble::as_tibble()。

我怀疑 widyr 包中的 pairwise_count 函数使用了一些过时的函数，导致了这些警告。 tidyverse 中是否有更新的包或功能可以用作替代品？否则，有没有办法在不触发这些警告的情况下使用该功能？

【问题讨论】：

如果您在这篇文章中包含数据和代码而不是将人们发送到另一个网站，将会很有帮助。
@RonakShah 我已经用相关的代码行更新了这个问题。
@RonakShah 的意思是您应该在问题中包含minimal reproducible example，而不仅仅是生成警告消息的代码行。也就是说，我在回答中包含了minimal reproducible example。

标签： r tidyverse tidytext

【解决方案1】：

使用 R 进行文本挖掘第 4 章的 widyr 部分的代码会生成已弃用的函数消息，以供使用 distinct_() 和 tbl_df() 函数。由于本书第 4 章中有 100 多行代码，我们将其缩减为相关部分以及复制警告消息所需的最少包数。

library(dplyr)
library(janeaustenr)
library(tidytext)
austen_section_words <- austen_books() %>%
     filter(book == "Pride & Prejudice") %>%
     mutate(section = row_number() %/% 10) %>%
     filter(section > 0) %>%
     unnest_tokens(word, text) %>%
     filter(!word %in% stop_words$word)

austen_section_words

library(widyr)

# count words co-occuring within sections
word_pairs <- austen_section_words %>%
     pairwise_count(word, section, sort = TRUE)

word_pairs

...生成以下内容：

> # count words co-occuring within sections
> word_pairs <- austen_section_words %>%
+      pairwise_count(word, section, sort = TRUE)
Warning messages:
1: `distinct_()` is deprecated as of dplyr 0.7.0.
Please use `distinct()` instead.
See vignette('programming') for more help
This warning is displayed once every 8 hours.
Call `lifecycle::last_warnings()` to see where this warning was generated. 
2: `tbl_df()` is deprecated as of dplyr 1.0.0.
Please use `tibble::as_tibble()` instead.
This warning is displayed once every 8 hours.
Call `lifecycle::last_warnings()` to see where this warning was generated. 
> 
> word_pairs
# A tibble: 796,008 x 3
   item1     item2         n
   <chr>     <chr>     <dbl>
 1 darcy     elizabeth   144
 2 elizabeth darcy       144
 3 miss      elizabeth   110
 4 elizabeth miss        110
 5 elizabeth jane        106
 6 jane      elizabeth   106
 7 miss      darcy        92
 8 darcy     miss         92
 9 elizabeth bingley      91
10 bingley   elizabeth    91
# … with 795,998 more rows

生成这些消息是因为widyr::pairwise_count() 使用dplyr::distinct_()，然后调用tbl_df()。

#' @rdname pairwise_count
#' @export
pairwise_count_ <- function(tbl, item, feature, wt = NULL, ...) {
  if (is.null(wt)) {
    func <- squarely_(function(m) m %*% t(m), sparse = TRUE, ...)
    wt <- "..value"
  } else {
    func <- squarely_(function(m) m %*% t(m > 0), sparse = TRUE, ...)
  }

  tbl %>%
    distinct_(.dots = c(item, feature), .keep_all = TRUE) %>%
    mutate(..value = 1) %>%
    func(item, feature, wt) %>%
    rename(n = value)
}

当我们用lifecycle::last_warnings()打印警告信息时，我们可以看到警告的来源。

<deprecated>
message: `tbl_df()` is deprecated as of dplyr 1.0.0.
Please use `tibble::as_tibble()` instead.
This warning is displayed once every 8 hours.
Call `lifecycle::last_warnings()` to see where this warning was generated.
backtrace:
  9. widyr::pairwise_count(., word, section, sort = TRUE)
 10. widyr::pairwise_count_(...)
  3. dplyr::distinct_(., .dots = c(item, feature), .keep_all = TRUE)
  3. dplyr::mutate(., ..value = 1)
 10. widyr:::func(., item, feature, wt)
 19. widyr:::new_f(tbl, item, feature, value, ...)
  7. widyr:::custom_melt(.)
 15. dplyr::tbl_df(.)

>

widyr 的 0.1.3 版是软件包的当前版本。要解决这些警告消息，必须替换widyr::pairwise_count() 中对dplyr::distinct_() 的引用。由于这是当前受支持的 R 包，因此要启动此过程，需要在 widyr Github Issues page 报告问题。

如警告消息中所述，distinct_() 已替换为 dplyr::distinct()，tbl_df() 已替换为 tibble::as_tibble()。

抑制警告

可以通过将pairwise_count() 包装在suppressWarnings() 函数中来抑制由pairwise_count() 产生的警告。

library(widyr)
suppressWarnings(
# count words co-occuring within sections
word_pairs <- austen_section_words %>%
     pairwise_count(word, section, sort = TRUE))

...和输出：

> suppressWarnings(
+ # count words co-occuring within sections
+ word_pairs <- austen_section_words %>%
+      pairwise_count(word, section, sort = TRUE))
> 
> word_pairs
# A tibble: 796,008 x 3
   item1     item2         n
   <chr>     <chr>     <dbl>
 1 darcy     elizabeth   144
 2 elizabeth darcy       144
 3 miss      elizabeth   110
 4 elizabeth miss        110
 5 elizabeth jane        106
 6 jane      elizabeth   106
 7 miss      darcy        92
 8 darcy     miss         92
 9 elizabeth bingley      91
10 bingley   elizabeth    91
# … with 795,998 more rows

附录

此代码在 R 的 4.0.2 版本上运行，包含以下软件包，如 sessionInfo() 所报告：

R version 4.0.2 (2020-06-22)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Catalina 10.15.6

Matrix products: default
BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] tidytext_0.2.5    janeaustenr_0.1.5 widyr_0.1.3       tidyr_1.1.1      
[5] dplyr_1.0.2      

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.5       rstudioapi_0.11  magrittr_1.5     tidyselect_1.1.0
 [5] lattice_0.20-41  R6_2.4.1         rlang_0.4.7      fansi_0.4.1     
 [9] stringr_1.4.0    tools_4.0.2      grid_4.0.2       packrat_0.5.0   
[13] broom_0.7.0      utf8_1.1.4       cli_2.0.2        ellipsis_0.3.1  
[17] assertthat_0.2.1 tibble_3.0.3     lifecycle_0.2.0  crayon_1.3.4    
[21] Matrix_1.2-18    purrr_0.3.4      vctrs_0.3.2      tokenizers_0.2.1
[25] SnowballC_0.7.0  glue_1.4.1       stringi_1.4.6    compiler_4.0.2  
[29] pillar_1.4.6     generics_0.0.2   backports_1.1.8  pkgconfig_2.0.3

【讨论】：

见this github issueJulia Silge 的评论
@phiver - 如果我正确理解了 github 问题，则链接问题是指尝试为数据中未找到的变量计算 distinct() 时出现错误，该问题已由 tidytext 0.1.9.9 解决。截至 2020 年 9 月 20 日，github 中 pairwise_count() 的最新源代码仍使用 distinct_()，如我的回答中所述，因此对 widyr 开发版本的更新不会消除警告消息。
正确。 Julia 说你仍然会看到关于 distinct_() 的警告。 Tidyverse 包很好，但是相互依赖太多了 :-( .