R：用许多配对物种和丰度列重新组织数据框答案

【问题标题】：R: Reorganize data frame with many paired species and abundance columnsR：用许多配对物种和丰度列重新组织数据框
【发布时间】：2016-02-10 19:59:57
【问题描述】：

我得到了一个生态数据的数据框，其中包含几对物种丰富度的列，如下所示：

df <- data.frame(site = 1:3,
                 sp1 = c("A","A","X"), abund1 = c(10,20,30),
                 sp2 = c("B","B","Y"), abund2 = c(10,20,30),
                 sp3 = c("C","Y","Z"), abund3 = c(10,20,30))

   site sp1 abund1 sp2 abund2 sp3 abund3
1     1   A     10   B     10   C     10
2     2   A     20   B     20   Y     20
3     3   X     30   Y     30   Z     30

（我使用的实际数据有 6 对物种和丰度列）

我需要将其转换为站点与物种格式以进行任何进一步的分析，如下所示：

    site   A    B    C    X    Y    Z
1      1  10   10   10    0    0    0
2      2  20   20    0    0   20    0
3      3   0    0    0   30   30   30

我唯一能想到的就是首先将其转换为包含“site”、“species”和“abundance”列的 3 列数据框，然后使用 reshape 包。为此，我正在考虑使用 for 循环遍历原始数据帧的每一行，将每一行转换为一个新数据帧，然后使用 rbind 将它们全部组合在一起。但这似乎很笨拙，我想知道是否有人可以提出更好的方法？

【问题讨论】：

标签： r

【解决方案1】：

我们可以尝试从reshape2 中的recast 先融化数据框，然后再投射宽。使用measure.var=c(2,4,6) 来识别正确的标签列。 recast 可以通过将 id.var 和 measure.var 发送到 melt 然后将所有其他参数发送到 dcast 来组合这两个函数：

library(reshape2)
recast(df, id.var="site",measure.var=c(2,4,6), site~value,value.var="site",fill=0)
#   site A B C X Y Z
# 1    1 1 1 1 0 0 0
# 2    2 2 2 0 0 2 0
# 3    3 0 0 0 3 3 3

更新

有了新数据：

s <- stack(df[-1])
newdf <- cbind(site=df[,1],as.data.frame(lapply(split(s, as.numeric(grepl("sp", s$ind))),'[',1)))
dcast(newdf, site~values.1, fill=0, value.var="values")
#   site  A  B  C  X  Y  Z
# 1    1 10 10 10  0  0  0
# 2    2 20 20  0  0 20  0
# 3    3  0  0  0 30 30 30

甚至：

x1 <- unlist(df[-1][c(T,F)], use.names=F)
x2 <- unlist(df[-1][c(F,T)], use.names=F)
df2 <- cbind.data.frame(site=df$site,x1,x2)
dcast(df2, site~x1, fill=0, value.var="x2")

这应该也可以：

m1 <- melt(df, id.var="site", measure.var=c(2,4,6))
m2 <- melt(df, id.var="site", measure.var=c(3,5,7))
m3 <- merge(m1, m2, by=1)[c(T,F)]
dcast(m3[!duplicated(m3[1:2]),], site~value.x, fill=0, value.var="value.y")

【讨论】：

我用我的真实数据试了一下，还是不行。我失去了丰度值，只得到了站点编号。事实证明，我的示例数据选择不当 - 丰度值与站点编号相同。我将对其进行编辑，以便您查看。
谢谢，这就是我要找的！我喜欢您使用 unlist 然后 cbind 的解决方案。如果您将合并替换为 cbind，最后一个解决方案也有效。

【解决方案2】：

考虑列绑定多个reshape2 dcasts，然后选择最终列：

library(reshape2)

reshapedf <- cbind(dcast(df[c('site', 'sp1', 'abund1')],
                         site~sp1, sum, value.var="abund1"),
                   dcast(df[c('site', 'sp2', 'abund2')],
                         site~sp2, sum, value.var="abund2"),
                   dcast(df[c('site', 'sp3', 'abund3')],
                         site~sp3, sum, value.var="abund3"))

reshapedf <- reshapedf[c('site','A','B','C','X','Y','Z')]

#   site    A   B   C   X   Y   Z
#1  1       10  10  10  0   0   0
#2  2       20  20  0   0   0   0
#3  3       0   0   0   30  30  30

【讨论】：