【问题标题】:How to remove N rows of a data frame, according to conditions如何根据条件删除数据框的 N 行
【发布时间】:2016-02-22 00:56:00
【问题描述】:

我的问题来自How to find tail rows of a data frame that satisfy set criteria?,因此,我的(更新的)样本数据的结构如下:

Individ <- data.frame(Participant = c("Bill", "Bill", "Bill", "Bill", "Bill", "Jane", "Jane", "Jane", "Jane", "Jane", "Jane", "Jane", "Jane", "Jane", "Jane", "Jane", "Jane", "Bill", "Bill", "Bill", "Bill"),  
                      Time = c(1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4),
                      Condition = c("Placebo", "Placebo", "Placebo", "Placebo", "Placebo", "Expr", "Expr", "Expr", "Expr", "Expr", "Expr", "Placebo", "Placebo", "Placebo", "Placebo", "Placebo", "Placebo", "Expr", "Expr", "Expr", "Expr"),
                      Location = c("Home", "Home", "Home", "Home", "Home", "Home", "Home", "Home", "Home", "Home", "Home", "Home", "Home", "Home", "Home", "Home", "Home", "Away", "Away", "Away", "Away"),
                      Power = c(400, 250, 180, 500, 300, 600, 512, 300, 500, 450, 200, 402, 210, 130, 520, 310, 451, 608, 582, 390, 570))

我已经学会根据Power 的最后一次出现在不同的Condition 加上Location 中找到每个Participant 的尾行。我现在希望为每个ConditionLocation 从每个Participant 中删除最后3 行。但是,为每个ParticipantCondition 收集的Time 不同,因此我不能纯粹基于标准化的Time 删除行。

如何快速遍历每个 Participant 和它们各自的 Conditionplus Location 并删除最后 3 行?我的实际数据框是 400 万行 + 超过 50 名参与者,因此理想情况下,需要迭代每个 ParticipantCondition 的解决方案。

我的预期输出是:

Output <- data.frame(Participant = c("Bill", "Bill", "Jane", "Jane", "Jane", "Jane", "Jane", "Jane", "Bill"),
                     Time = c(1, 2, 1, 2, 3, 1, 2, 3, 1),
                     Condition = c("Placebo", "Placebo", "Expr", "Expr", "Expr", "Placebo", "Placebo", "Placebo", "Expr"),
                     Location = c("Home", "Home", "Home", "Home", "Home", "Home", "Home", "Home", "Away"),
                     Power = c(400, 250, 600, 512, 300, 402, 210, 130, 608))

【问题讨论】:

    标签: r


    【解决方案1】:

    如果您使用dplyr,与row_number()n()...

    library(dplyr)
    Individ %>%
      group_by(Participant, Condition, Location) %>%
      filter(row_number() < n() - 2)
    

    返回

    Source: local data frame [9 x 5]
    Groups: Participant, Condition, Location [4]
    
      Participant  Time Condition Location Power
           (fctr) (dbl)    (fctr)   (fctr) (dbl)
    1        Bill     1   Placebo     Home   400
    2        Bill     2   Placebo     Home   250
    3        Jane     1      Expr     Home   600
    4        Jane     2      Expr     Home   512
    5        Jane     3      Expr     Home   300
    6        Jane     1   Placebo     Home   402
    7        Jane     2   Placebo     Home   210
    8        Jane     3   Placebo     Home   130
    9        Bill     1      Expr     Away   608
    

    【讨论】:

    • 谁知道会这么容易?!感谢您提供dplyr 解决方案。
    【解决方案2】:

    使用data.table 的选项。我们将“data.frame”转换为“data.table”(setDT(Individ)),按“参与者”、“条件”和“位置”分组,我们使用head删除每个组合的最后 3 个观察值。

    library(data.table)
    setDT(Individ)[, head(.SD, -3) , .(Participant, Condition, Location)]
    #   Participant Condition Location Time Power
    #1:        Bill   Placebo     Home    1   400
    #2:        Bill   Placebo     Home    2   250
    #3:        Jane      Expr     Home    1   600
    #4:        Jane      Expr     Home    2   512
    #5:        Jane      Expr     Home    3   300
    #6:        Jane   Placebo     Home    1   402
    #7:        Jane   Placebo     Home    2   210
    #8:        Jane   Placebo     Home    3   130
    #9:        Bill      Expr     Away    1   608
    

    dplyr 中的等效选项是

    library(dplyr)
    Individ %>% 
         group_by(Participant, Condition, Location) %>% 
         do(head(., -3))
    #  Participant  Time Condition Location Power
    #       (fctr) (dbl)    (fctr)   (fctr) (dbl)
    #1        Bill     1      Expr     Away   608
    #2        Bill     1   Placebo     Home   400
    #3        Bill     2   Placebo     Home   250
    #4        Jane     1      Expr     Home   600
    #5        Jane     2      Expr     Home   512
    #6        Jane     3      Expr     Home   300
    #7        Jane     1   Placebo     Home   402
    #8        Jane     2   Placebo     Home   210
    #9        Jane     3   Placebo     Home   130
    

    【讨论】:

      猜你喜欢
      • 2022-10-14
      • 2018-06-04
      • 2022-07-23
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2019-07-07
      • 1970-01-01
      • 2020-03-06
      相关资源
      最近更新 更多