【问题标题】:R programming: plyr how to count values from a column with ddply [duplicate]R编程:plyr如何使用ddply计算列中的值[重复]
【发布时间】:2013-12-04 21:09:00
【问题描述】:

我想将我的数据的通过/失败状态总结如下。换句话说,我想告诉每个产品/类型的通过和失败案例的数量。

library(ggplot2)
library(plyr)
product=c("p1","p1","p1","p1","p1","p1","p1","p1","p1","p1","p1","p1","p2","p2","p2","p2","p2","p2","p2","p2","p2","p2","p2","p2")
type=c("t1","t1","t1","t1","t1","t1","t2","t2","t2","t2","t2","t2","t1","t1","t1","t1","t1","t1","t2","t2","t2","t2","t2","t2")
skew=c("s1","s1","s1","s2","s2","s2","s1","s1","s1","s2","s2","s2","s1","s1","s1","s2","s2","s2","s1","s1","s1","s2","s2","s2")
color=c("c1","c2","c3","c1","c2","c3","c1","c2","c3","c1","c2","c3","c1","c2","c3","c1","c2","c3","c1","c2","c3","c1","c2","c3")
result=c("pass","pass","fail","pass","pass","pass","fail","pass","fail","pass","fail","pass","fail","pass","fail","pass","pass","pass","pass","fail","fail","pass","pass","fail")
df = data.frame(product, type, skew, color, result)

以下 cmd 返回通过+失败案例的总数,但我想要单独的列用于通过和失败

dfSummary <- ddply(df, c("product", "type"), summarise, N=length(result))

结果是:

        product type N
 1      p1      t1   6
 2      p1      t2   6
 3      p2      t1   6
 4      p2      t2   6

理想的结果是

         product type Pass Fail
 1       p1      t1   5    1
 2       p1      t2   3    3
 3       p2      t1   4    2
 4       p2      t2   3    3

我尝试过类似的事情:

 dfSummary <- ddply(df, c("product", "type"), summarise, Pass=length(df$product[df$result=="pass"]), Fail=length(df$product[df$result=="fail"]) )

但显然这是错误的,因为结果是失败和通过的总和。

提前感谢您的建议! 问候, 里亚德。

【问题讨论】:

    标签: r plyr


    【解决方案1】:

    您也可以使用reshape2::dcast

    library(reshape2)
    dcast(product + type~result,data=df, fun.aggregate= length,value.var = 'result')
    ##   product type fail pass
    ## 1      p1   t1    1    5
    ## 2      p1   t2    3    3
    ## 3      p2   t1    2    4
    ## 4      p2   t2    3    3
    

    【讨论】:

    • 非常感谢!这也可以。
    • 比 ddply 快得多。谢谢:)
    【解决方案2】:

    试试:

    dfSummary <- ddply(df, c("product", "type"), summarise, 
                       Pass=sum(result=="pass"), Fail=sum(result=="fail") )
    

    这给了我结果:

      product type Pass Fail
    1      p1   t1    5    1
    2      p1   t2    3    3
    3      p2   t1    4    2
    4      p2   t2    3    3
    

    解释:

    1. 您将数据集df 提供给ddply 函数。
    2. ddply 正在拆分变量“产品”和“类型”
      • 这会导致 length(unique(product)) * length(unique(type)) 片段(即数据 df 的子集)在两个变量的每个组合上拆分。
    3. 对于每个部分,ddply 应用您提供的某些功能。在这种情况下,您计算result=="pass"result=="fail" 的数量。
    4. 现在ddply 为每个部分留下了一些结果,即您拆分的变量(产品和类型)和您请求的结果(通过和失败)。
    5. 它将所有部分组合在一起并返回

    【讨论】:

    • 完美,这就是我所需要的!谢谢你的及时回答!
    猜你喜欢
    • 1970-01-01
    • 2012-08-18
    • 1970-01-01
    • 1970-01-01
    • 2017-11-11
    • 1970-01-01
    • 2015-05-12
    • 2015-04-05
    相关资源
    最近更新 更多