【问题标题】:Select first 80 observations for each level in R为 R 中的每个级别选择前 80 个观测值
【发布时间】:2013-05-19 07:38:42
【问题描述】:

我有一个如下所示的数据集:

structure(list(A = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Label = c("1", 
"2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", 
"14", "15", "16", "17", "18", "19", "20", "21", "22", "23", "24", 
"25"), class = "factor"), T = c(0.04, 0.08, 0.12, 0.16, 0.2, 
0.24), X = c(464.4, 464.4, 464.4, 464.4, 464.4, 464.4), Y = c(418.5, 
418.5, 418.5, 418.5, 418.5, 418.5), V = c(0, 0, 0, 0, 0, 0), 
    GD = c(0, 0, 0, 0, 0, 0), ND = c(NA, 0, 0, 0, 0, 0), ND2 = c(NA, 
    0, 0, 0, 0, 0), TID = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Label = c("t1", 
    "t10", "t11", "t12", "t13", "t14", "t15", "t16", "t17", "t18", 
    "t19", "t2", "t20", "t21", "t22", "t23", "t24", "t25", "t3", 
    "t4", "t5", "t6", "t7", "t8", "t9"), class = "factor")), .Names = c("A", 
"T", "X", "Y", "V", "GD", "ND", "ND2", "TID"), row.names = c(NA, 
6L), class = "data.frame")

我想为每个 TID 选择所有变量的前 80 个观察值。到目前为止,我可以仅使用代码对第一个 TID 执行此操作:

sub.data1<-NM[1:80, ]

如何为我的所有其他 TID 执行此操作?

谢谢!

【问题讨论】:

    标签: r data-management


    【解决方案1】:

    我愿意:

    lapply(split(dat, dat$TID), head, 80)
    

    它返回一个包含 80 行(或更少)行的 data.frames 列表。相反,如果您希望将所有内容都放在一个 data.frame 中:

    do.call(rbind, lapply(split(dat, dat$TID), head, 80))
    

    【讨论】:

    • 对不起,我忘了说我也想保留所有其他变量。
    【解决方案2】:

    使用ddply() from plyr 函数,您可以按 TID 拆分数据,然后使用head() 选择 forst 80,然后将所有数据再次放入一个数据帧中,

    library(plyr)
    ddply(NM, .(TID), head, n = 80)
    

    【讨论】:

    • +1!可能不需要 lambda 函数,ddply(NM, .(TID), head, n = 80) 应该可以工作。
    【解决方案3】:

    使用数据表,我做了一个简短的示例,其中仅包含 TID t1 和 t2,它返回 t1 和 t2 的前 2 行。它可以根据您的数据进行调整。

    library(data.table)
    data<-structure(list(A = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Label = c("1", 
                    "2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", 
                    "14", "15", "16", "17", "18", "19", "20", "21", "22", "23", "24", 
                    "25"), class = "factor"), T = c(0.04, 0.08, 0.12, 0.16, 0.2, 
                    0.24), X = c(464.4, 464.4, 464.4, 464.4, 464.4, 464.4), Y = c(418.5, 
                            418.5, 418.5, 418.5, 418.5, 418.5), V = c(0, 0, 0, 0, 0, 0), 
                    GD = c(0, 0, 0, 0, 0, 0), ND = c(NA, 0, 0, 0, 0, 0), ND2 = c(NA, 
                            0, 0, 0, 0, 0), TID = c("t1","t1","t1","t2","t2","t2")), .Names = c("A", 
                    "T", "X", "Y", "V", "GD", "ND", "ND2", "TID"), row.names = c(NA, 
                    6L), class = "data.frame")
    dt<-data.table(data)
    dt[,head(.SD,2),by=TID]
    

    这会导致:

       TID A    T     X     Y V GD ND ND2
    1:  t1 1 0.04 464.4 418.5 0  0 NA  NA
    2:  t1 1 0.08 464.4 418.5 0  0  0   0
    3:  t2 1 0.16 464.4 418.5 0  0  0   0
    4:  t2 1 0.20 464.4 418.5 0  0  0   0
    

    如果需要,可以通过将最后一行更改为

    将其更改回数据框
    as.data.frame(dt[,head(.SD,2),by=TID])
    

    【讨论】:

      【解决方案4】:

      这是base中的另一种解决方案:

      do.call(rbind, by(NM, NM$TID, head, 80))
      

      【讨论】:

        猜你喜欢
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 2020-08-31
        • 2010-09-15
        • 2015-02-09
        • 2016-09-24
        • 2021-02-23
        相关资源
        最近更新 更多