在 dplyr 中应用带有 cross() 的 summarise() 函数时出错答案

【问题标题】：Error in applying summarise() function with across() in dplyr在 dplyr 中应用带有 cross() 的 summarise() 函数时出错
【发布时间】：2021-06-01 15:13:38
【问题描述】：

我有两个数据集，一个由我个人为单个标本收集的数据组成，另一个由文献中报道的先前研究的平均数据组成。我想要做的是重新平均结合单个测量值和平均测量值的数据。例如，如果我有 10 个个体样本和来自不同研究的同一物种的 10 个个体的报告平均值，我希望生成 20 个样本的平均值。附件是一个示例数据集。 df 和 df2 之间没有任何重叠的分类群，但在实际数据集中有。

df<-data.frame(taxon=c("Abrocoma_bennettii","Abrocoma_bennettii","Abrocoma_bennettii",
                   "Sylvisorex_johnstoni","Abrocoma_bennettii","Abrocoma_bennettii",
                   "Abrocoma_bennettii","Blarina_carolinensis","Abrocoma_cinerea",
                   "Sorex_hoyi","Abrocoma_cinerea","Sorex_cinereus",
                   "Cryptotis_parva","Sorex_cinereus","Sorex_nanus",
                   "Sorex_nanus","Sorex_vagrans","Peromyscus_leucopus",
                   "Sorex_cinereus","Sorex_nanus","Sorex_cinereus",
                   "Sorex_cinereus","Sorex_cinereus","Cryptotis_parva",
                   "Sorex_cinereus","Sorex_cinereus","Sorex_cinereus",
                   "Sorex_cinereus","Sorex_cinereus","Sorex_cinereus",
                   "Sorex_nanus","Sorex_nanus","Sorex_vagrans",
                   "Sorex_cinereus","Sorex_nanus","Sorex_nanus",
                   "Sorex_arcticus","Sorex_cinereus","Sorex_cinereus",
                   "Sorex_cinereus","Sorex_cinereus","Sorex_cinereus",
                   "Sorex_cinereus","Sorex_cinereus","Sorex_fumeus",
                   "Sorex_haydeni","Sorex_haydeni","Sorex_nanus",
                   "Blarina_brevicauda","Sorex_cinereus","Sorex_cinereus",
                   "Sorex_cinereus","Sorex_cinereus","Sorex_cinereus",
                   "Abrothrix_longipilis","Sorex_cinereus","Sorex_cinereus",
                   "Sorex_cinereus","Sorex_cinereus","Sorex_cinereus",
                   "Sorex_monticolus","Sorex_monticolus","Sorex_cinereus",
                   "Sorex_cinereus","Sorex_cinereus","Sorex_cinereus",
                   "Sorex_cinereus","Sorex_cinereus","Sorex_cinereus",
                   "Sorex_cinereus","Sorex_cinereus","Sorex_cinereus",
                   "Sorex_cinereus","Sorex_cinereus","Sorex_cinereus",
                   "Sorex_cinereus","Sorex_haydeni","Sorex_haydeni",
                   "Sorex_haydeni","Sorex_hoyi","Sorex_hoyi",
                   "Sorex_nanus","Sorex_nanus","Cryptotis_parva",
                   "Cryptotis_parva","Cryptotis_parva","Cryptotis_parva",
                   "Sorex_cinereus","Sorex_cinereus","Sorex_cinereus",
                   "Sorex_cinereus","Sorex_cinereus","Sorex_cinereus",
                   "Sorex_cinereus","Sorex_cinereus","Sorex_cinereus",
                   "Sorex_cinereus","Sorex_cinereus","Sorex_cinereus",
                   "Sorex_cinereus"),
           x=c(159.0,221.0,184.0,55.0,163.0,214.0,232.0,67.0,198.0,55.0,150.0,55.0,57.0,56.5,56.0,55.0,61.0,67.0,55.0,56.0,62.0,58.0,58.0,55.0,57.0,55.0,55.0,57.5,55.0,55.0,55.0,61.0,60.0,64.0,55.0,56.0,56.0,55.5,58.0,56.0,61.0,63.0,60.0,58.5,55.0,56.0,60.0,55.0,70.0,55.0,55.0,59.0,70.0,65.0,88.0,56.0,63.0,55.0,55.0,56.0,55.0,58.0,57.0,65.0,55.0,55.0,59.0,55.0,60.0,57.0,66.0,65.0,60.0,60.0,62.0,56.5,58.0,58.0,56.0,57.0,55.0,55.0,57.0,63.0,58.0,57.0,59.0,55.0,55.0,56.0,57.0,58.0,60.0,55.0,59.0,55.5,55.0,68.0,66.0,64.0),
y=c(115.00, 286.00, 222.00,   1.00, 109.00, 224.00, 317.00,   1.40, 144.00,   1.75,
105.00,   1.85,   1.90,   2.00,   2.00,   2.00,   2.00,   2.10,   2.10,   2.20,
2.30,   2.30,   2.40,   2.50,   2.50,   2.50,   2.50,   2.50,   2.50,   2.50,
2.50,   2.50,   2.50,   2.60,   2.60,   2.60,   2.70,   2.70,   2.70,   2.70,
2.70,   2.70,   2.70,   2.70,   2.70,   2.70,   2.70,   2.70,   2.80,   2.80,
2.80,   2.80,   2.80,   2.80, 222.00,   2.80,   2.80,   2.80,   2.80,   2.80,
2.80,   2.86,   2.90,   2.90,   2.90,   2.90,   2.90,   2.90,   2.90,   2.90,
2.90,   2.90,   2.90,   2.90,   2.90,   2.90,   2.90,   2.90,   2.90,   2.90,
2.90,   2.90,   2.90,   3.00,   3.00,   3.00,   3.00,   3.00,   3.00,   3.00,
3.00,   3.00,   3.00,   3.00,   3.00,   3.00,   3.00,   3.00,   3.00,   3.00))
df2<-data.frame(taxon=c("Eulemur_collaris","Leopardus_colocolo",
"Leopardus_colocolo","Vicugna_vicugna","Vicugna_vicugna","Equus_quagga",
"Equus_quagga","Priodontes_maximus","Priodontes_maximus","Crocuta_crocuta"),
N=c(11,10,2,50,50,13,8,9 ,9,5),
x=c(461.0,565.0,505.0,1107.0,963.0,2046.0,2050.0,929.1,926.9,1236.0),
y=c(2150,3900,4000,36200,33200,247830,219050,31680,34690,47400))

以前，我一直在使用以下代码执行此操作，遵循the answer to one of my previous questions。

df3<-df%>%
  mutate(N = 1) %>%
  bind_rows(df2) %>%
  group_by(taxon) %>%
  summarise(across(c(x,y), weighted.mean, N), 
            N = sum(N))

这工作了很长一段时间。但是，当我最近尝试重新运行代码时，出现以下错误。

Error: Problem with `summarise()` input `..1`.
x object 'N' not found
i Input `..1` is `(function (.cols = everything(), .fns = NULL, ..., .names = NULL) ...`.
i The error occurred in group 1: taxon = "Abrocoma_bennettii".

我一直无法弄清楚是什么导致了这个错误。我没有更改用于运行数据的代码中的任何内容。我返回并加载了一个旧版本的数据库，我知道它之前已经成功运行了代码，但我遇到了以前没有的相同错误。正如您从数据集中看到的那样，这是一个可重现的错误，仅基于我的数据的这个小子样本。即使在代码先前工作的数据集以及高度简化的数据集上，R 也会返回此错误，这让我想知道这是否是 dplyr 中的错误，而不是与数据框的语法有关。但我不知道究竟是什么错误或如何纠正它。

我单独运行了每一行代码，结果发现是 summarise(across(c(x,y), weighted.mean, N), N = sum(N)) 行导致了错误，但我仍然无法弄清楚具体出了什么问题。

【问题讨论】：

代码中有一个额外的%>%。除此之外，此代码对我有用，对您共享的数据没有任何错误。你的packageVersion('dplyr') 是什么？我在‘1.0.3’
这似乎是在 v1.0.4 上引入的一个错误。看来下个版本会修复它们：dplyr.tidyverse.org/news/index.html“修复了上一版本中cross()中引入的错误”。
与此同时，summarise(across(c(x,y), ~ weighted.mean(., N)) 有效。
@RonakShah 我在'1.0.4'。我还用额外的%>% 解决了这个问题。

标签： r dplyr summary summarize

【解决方案1】：

试试：

df3<-df %>%
  mutate(N = 1) %>%
  bind_rows(df2) %>%
  group_by(taxon) %>%
  dplyr::summarise(across(c(x,y), weighted.mean, N), 
            N = sum(N))

【讨论】：