使用 R 的 dcast 对缺失条目进行平均聚合答案

【问题标题】：Using R's dcast to aggregate by mean with missing entries使用 R 的 dcast 对缺失条目进行平均聚合
【发布时间】：2017-06-09 10:16:58
【问题描述】：

我不熟悉使用 reshape2 及其功能。我有一个数据表 d，我正在尝试汇总有关年站物种计数的数据，以获得每年所有站的每个物种的平均计数：

d<-data.table(station=c(1,1,4,3),year=c(2000,2000,2001,2000),
   species=c("cat","dog","dog","owl"),abundance=c(10,20,30,10))
d

>   station year species abundance
 1:       1 2000     cat        10
 2:       1 2000     dog        20
 3:       4 2001     dog        30
 4:       3 2000     owl        10

我使用 dcast 来聚合丰度，但我似乎得到的是一个忽略生成的 NaN 结果的总和，而不是平均值：

dm<-dcast(d, year~ species,value.var="abundance",fun.aggregate = mean)
dm
>   year cat dog owl
 1: 2000  10  20  10
 2: 2001 NaN  30 NaN

我想要的是：

>   year  cat   dog   owl
 1: 2000  3.33  6.67  3.33
 2: 2001  0     30    0

使用参数 fill=0 只会导致 NaN 被 0 替换。

如果您有任何建议，我将不胜感激。我已阅读文档并寻找教程，但无法解决此问题。

【问题讨论】：

标签： r reshape2 dcast

【解决方案1】：

“平均”一词的使用并不是特别标准。我认为创建一个名为mean_abundance 的新变量将是最好的解决方案。

d[, mean_abundance := abundance/length(abundance), by = year]

dm <- dcast(d, year~ species,value.var="mean_abundance")
dm[is.na(dm)] <- 0

【讨论】：

【解决方案2】：

我们可以通过tidyverse 做到这一点

library(tidyverse)
d %>%
    group_by(year) %>%
    mutate(mean_abundance = abundance/n()) %>%
    spread(species, mean_abundance, fill = 0)

【讨论】：