【问题标题】:How to subset multiple columns condition in R?如何在 R 中对多列条件进行子集化?
【发布时间】:2019-02-02 13:43:23
【问题描述】:

全部,

我的dataset 如下所示。我正在尝试回答以下问题。

问题:

仅基于绘图纸数据,商店销售的一种纸张子类型(paper.type)的单位(units.sold 列)是否比其他纸张多?

为了回答上述问题,我使用了tapply 函数,我可以在其中过滤两篇论文的数据。现在我不确定如何进一步仅获取绘图纸数据。任何帮助表示赞赏!

我的代码

tapply(df$units.sold,list(df$paper,df$paper.type,df$store),sum)

数据集

             date year     rep     store paper          paper.type  unit.price   units.sold total.sale
9991  12/30/2015 2015     Ran    Dublin watercolor      sheet       0.77          5       3.85
9992  12/30/2015 2015     Ran    Dublin    drawing       pads      10.26          1      10.26
9993  12/30/2015 2015  Arijit  Syracuse watercolor        pad      12.15          2      24.30
9994  12/30/2015 2015  Thomas Davenport    drawing       roll      20.99          1      20.99
9995  12/31/2015 2015   Ruisi    Dublin watercolor      sheet       0.77          7       5.39
9996  12/31/2015 2015   Mohit Davenport    drawing       roll      20.99          1      20.99
9997  12/31/2015 2015    Aman  Portland    drawing       pads      10.26          1      10.26
9998  12/31/2015 2015 Barakat  Portland watercolor      block      19.34          1      19.34
9999  12/31/2015 2015  Yunzhu  Syracuse    drawing    journal      24.94          1      24.94
10000 12/31/2015 2015    Aman  Portland watercolor      block      19.34          1      19.34

注意:我是 R 新手。请提供解释以及您的代码。

【问题讨论】:

    标签: r dataset subset tapply


    【解决方案1】:

    使用来自tidyversedplyr 及其filter 函数启动。您可以使用%>% 管道运算符将函数链接在一起。

    df2 <- df %>% 
      filter(paper == "drawing") %>% 
      group_by(store, paper.type) %>% 
      summarise(units.sold = sum(units.sold))
    
      store     paper.type units.sold
      <chr>     <chr>           <dbl>
    1 Davenport roll                2
    2 Dublin    pads                1
    3 Portland  pads                1
    4 Syracuse  journal             1
    

    【讨论】:

    • 谢谢!发现 dplyr 作为过滤我的数据集的另一种方式!
    【解决方案2】:

    您可以从基于storepaper.typeunit.sold 列中的aggregate 开始

    aggregate(units.sold~store+paper.type, df[df$paper == "drawing", ], sum)
    
    #      store paper.type units.sold
    #1  Syracuse    journal          1
    #2    Dublin       pads          1
    #3  Portland       pads          1
    #4 Davenport       roll          2
    

    这里我们只过滤paper的“绘图”类型的数据。我们可以根据这个输出比较每个storepaper.typeunits.sold的数量。

    【讨论】:

    • 感谢您的回复!我将如何根据售出的单位汇总商店和纸张类型?
    • @Data_is_Power 代码也是如此。对于每个storepaper.typesum 的单位仅出售“绘图”类型的纸。
    • 正在寻找相同的解决方案。
    【解决方案3】:

    我们可以使用data.table。使用setDT 将'data.frame' 转换为'data.table',按'store' 'paper.type' 分组,指定i 表达式(paper == 'drawing')以子集行并汇总'单位。通过获取它的sum 出售'

    library(data.table)
    setDT(df)[paper == "drawing", .(units.sold = sum(units.sold)), .(store, paper.type)]
    #       store paper.type units.sold
    #1:    Dublin       pads          1
    #2: Davenport       roll          2
    #3:  Portland       pads          1
    #4:  Syracuse    journal          1
    

    数据

    df <-  structure(list(date = c("12/30/2015", "12/30/2015", "12/30/2015", 
    "12/30/2015", "12/31/2015", "12/31/2015", "12/31/2015", "12/31/2015", 
    "12/31/2015", "12/31/2015"), year = c(2015L, 2015L, 2015L, 2015L, 
    2015L, 2015L, 2015L, 2015L, 2015L, 2015L), rep = c("Ran", "Ran", 
    "Arijit", "Thomas", "Ruisi", "Mohit", "Aman", "Barakat", "Yunzhu", 
    "Aman"), store = c("Dublin", "Dublin", "Syracuse", "Davenport", 
    "Dublin", "Davenport", "Portland", "Portland", "Syracuse", "Portland"
    ), paper = c("watercolor", "drawing", "watercolor", "drawing", 
    "watercolor", "drawing", "drawing", "watercolor", "drawing", 
    "watercolor"), paper.type = c("sheet", "pads", "pad", "roll", 
    "sheet", "roll", "pads", "block", "journal", "block"), unit.price = c(0.77, 
    10.26, 12.15, 20.99, 0.77, 20.99, 10.26, 19.34, 24.94, 19.34), 
        units.sold = c(5L, 1L, 2L, 1L, 7L, 1L, 1L, 1L, 1L, 1L), total.sale = c(3.85, 
        10.26, 24.3, 20.99, 5.39, 20.99, 10.26, 19.34, 24.94, 19.34
        )), class = "data.frame", row.names = c("9991", "9992", "9993", 
    "9994", "9995", "9996", "9997", "9998", "9999", "10000"))
    

    【讨论】:

      猜你喜欢
      • 2021-05-29
      • 2021-02-16
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2015-11-27
      • 2021-08-01
      • 1970-01-01
      相关资源
      最近更新 更多