将数据框拆分为 R 中的列子集列表答案

【问题标题】：Split Data Frame into a List of Subsets of Columns in R将数据框拆分为 R 中的列子集列表
【发布时间】：2017-06-24 00:11:34
【问题描述】：

我有一个如下数据框：

> set.seed(123)
> dat <- data.frame(samples = c("a.1","a.2","a.3","b.1","b.2","b.3"), ID = c(rep("A",3),rep("B",3))
> dat
  samples ID
1     a.1  A
2     a.2  A
3     a.3  A
4     b.1  B
5     b.2  B
6     b.3  B
> practice.data <- data.frame(a.1 = round(runif(5)), a.2=round(runif(5)),
  a.3=round(runif(5)),b.1=round(runif(5)),b.2=round(runif(5)),b.3=round(runif(5)))

> practice.data
  a.1 a.2 a.3 b.1 b.2 b.3
1   0   0   1   1   1   1
2   1   1   0   0   1   1
3   0   1   1   0   1   1
4   1   1   1   0   1   0
5   1   0   0   1   1   0

在上面的示例中，我想弄清楚如何将前三列与最后三列分开（即，在dat 中用ID 分隔）。将practice.data 放入列表后，我计划使用 lapply 函数对每个列表对象的行求和，为每个 ID 返回一个向量。

我已经尝试过使用 for 循环，但效率非常低，并且存在太多问题，所以如果我能弄清楚该怎么做，似乎使用列表和应用可能是最好的。

最终想要的输出是这样的：

【问题讨论】：

标签： r list apply

【解决方案1】：

# map column names to the ID 
g <- dat$ID[match(names(practice.data), dat$samples)]
g

#[1] A A A B B B
#Levels: A B

# split the practice data into smaller data frames based on the map and call rowSums
as.data.frame(lapply(split.default(practice.data, g), rowSums))

#  A B
#1 1 3
#2 2 2
#3 2 2
#4 3 1
#5 1 2

【讨论】：

【解决方案2】：

这是一个melt/dcast 选项

library(data.table)
dcast(melt(setDT(practice.data, keep.rownames = TRUE), id.var = 'rn', 
  variable.name = 'samples')[, sum(value), .(rn, samples)
  ][dat, on = .(samples)], rn~ID, value.var = 'V1', sum)[, rn := NULL][]
#   A B
#1: 1 3
#2: 2 2
#3: 2 2
#4: 3 1
#5: 1 2

【讨论】：