R和2个条件中的条件和答案

【问题标题】：Conditional sum in R and 2 conditionsR和2个条件中的条件和
【发布时间】：2014-01-02 14:15:16
【问题描述】：

我在 R 中有一个如下所示的数据框：

species sampletype content
 P1    O1         10
 P1    O2         12
 P1    O3         9
 P1    A          4
 P1    A          3
 P1    A          4
 P2    O1         21 
 P2    O1         12
 P2    O2         4
 P2    O3         6
 P2    A          7
 P2    A          7
 P2    A          3
 P3    O1         15 
 P3    O1         13
 P3    O1         5
 P3    O1         12
 P3    A          5
 P3    A          7
 P3    A          8
 P4    O1         12 
 P4    O1         11
 P4    O2         8
 P4    O2         2
 P4    A          4
 P4    A          3
 P4    A          4

现在我需要每个物种的 O* 样本的平均含量，其中 O1、O2 和 O3 是单独的样本，但重复出现例如 O1 算作一个 O1（分别为 O2 和 O3）。所以结果应该是这样的：

P1 = (10+12+9)/3
P2 = (21+12+4+6)/3   (since there is O1,O2 and O3)
P3 = (15+13+5+12)/1  (since only O1 occurs)
P4 = (12+11+8+2)/2   (since only O1 and O2 occur)

我已经尝试过使用合并、聚合、grep.. 但我在语法和复杂性方面遇到了困难。

【问题讨论】：

标签： r sum data-manipulation

【解决方案1】：

如果我理解正确，您不需要 sampletype 等于 A 的行。鉴于这是正确的，您可以这样做

d <- subset(x, sampletype != "A")
ddply(d, .(species), summarise, 
      avg=sum(content) / length(unique(sampletype)))

  species      avg
1      P1 10.33333
2      P2 14.33333
3      P3 45.00000
4      P4 16.50000

【讨论】：

df <- subset(x, sampletype != "A"); by(df,df$species, function(x) { sum(as.numeric(x[["content"]]))/length(unique(x[["sampletype"]]) }) 应该也能正常工作，其中 x 是数据框的名称