【问题标题】:Aggregate and obtain the last count of a group聚合并获取组的最后一个计数
【发布时间】:2014-01-26 04:29:58
【问题描述】:

这个问题与之前的问题here 部分相关。我想基于三列聚合计数,并获得由 date、id 和 rdate 三个变量定义的组的最后一个事件计数。我希望拥有的是这样的:

         date     rdate event
1   01-jan-90 08-jan-90     3
2   01-jan-90 15-jan-90     3
3   01-jan-90 01-jan-90     3
4   01-jan-90 22-jan-90     3
5   01-jan-90 29-jan-90     3
1.1 01-jan-90 08-jan-90     2
2.1 01-jan-90 15-jan-90     2
3.1 01-jan-90 01-jan-90     2
4.1 01-jan-90 22-jan-90     2
5.1 01-jan-90 29-jan-90     2

我已经尝试过这段代码,但这仅对获取组的平均值有效

aa<-aggregate(event ~ id+rdate+date,data = mydf,FUN=mean)

示例数据如下:

structure(list(date = c("01-jan-90", "01-jan-90", "01-jan-90", 
"01-jan-90", "01-jan-90", "01-jan-90", "01-jan-90", "01-jan-90", 
"01-jan-90", "01-jan-90", "01-jan-90", "01-jan-90", "01-jan-90", 
"01-jan-90", "01-jan-90", "02-jan-90", "02-jan-90", "02-jan-90", 
"02-jan-90", "02-jan-90", "02-jan-90", "02-jan-90", "02-jan-90", 
"02-jan-90", "02-jan-90"), id = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L), rdate = c("08-jan-90", "15-jan-90", "01-jan-90", "22-jan-90", 
"29-jan-90", "08-jan-90", "15-jan-90", "01-jan-90", "22-jan-90", 
"29-jan-90", "08-jan-90", "15-jan-90", "01-jan-90", "22-jan-90", 
"29-jan-90", "09-jan-90", "16-jan-90", "02-jan-90", "23-jan-90", 
"30-jan-90", "09-jan-90", "16-jan-90", "02-jan-90", "23-jan-90", 
"30-jan-90"), event = c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 
3L, 3L, 3L, 3L, 3L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L)), .Names = c("date", 
"id", "rdate", "event"), class = "data.frame", row.names = c("1", 
"2", "3", "4", "5", "1.1", "2.1", "3.1", "4.1", "5.1", "1.2", 
"2.2", "3.2", "4.2", "5.2", "6", "7", "8", "9", "10", "6.1", 
"7.1", "8.1", "9.1", "10.1"))

【问题讨论】:

  • 只需将您的函数更改为length,即aggregate(event ~ id+rdate+date,data = mydf,FUN=length)。顺便说一句,+1 是可重现的好例子。
  • @SimonO101,感谢您回答我的问题。

标签: r aggregate


【解决方案1】:

我认为这就是你所追求的:

> ddply(d, .(id, date, rdate), summarise, event = tail(event, 1))
   id      date     rdate event
1   1 01-jan-90 01-jan-90     3
2   1 01-jan-90 08-jan-90     3
3   1 01-jan-90 15-jan-90     3
4   1 01-jan-90 22-jan-90     3
5   1 01-jan-90 29-jan-90     3
6   2 02-jan-90 02-jan-90     2
7   2 02-jan-90 09-jan-90     2
8   2 02-jan-90 16-jan-90     2
9   2 02-jan-90 23-jan-90     2
10  2 02-jan-90 30-jan-90     2

如果顺序很重要,您可以获取结果并按日期和 rdate 排序。

【讨论】:

  • 这些建议是否能回答您的问题?如果是这样,请您将它们标记为已回答。谢谢!
【解决方案2】:

不完全确定您要做什么,但类似这样?

library(plyr)
ddply(mydf, .(id, date, rdate), summarise,
      date = tail(date, 1),
      id = tail(id, 1),
      rdate = tail(rdate, 1),
      mean = mean(event))
)

输出:

> library(plyr)
> ddply(mydf, .(id, date, rdate), summarise,
+       date = tail(date, 1),
+       id = tail(id, 1),
+       rdate = tail(rdate, 1),
+       mean = mean(event))
        date id     rdate mean
1  01-jan-90  1 01-jan-90  2.0
2  01-jan-90  1 08-jan-90  2.0
3  01-jan-90  1 15-jan-90  2.0
4  01-jan-90  1 22-jan-90  2.0
5  01-jan-90  1 29-jan-90  2.0
6  02-jan-90  2 02-jan-90  1.5
7  02-jan-90  2 09-jan-90  1.5
8  02-jan-90  2 16-jan-90  1.5
9  02-jan-90  2 23-jan-90  1.5
10 02-jan-90  2 30-jan-90  1.5
> 

【讨论】:

    猜你喜欢
    • 2015-05-15
    • 2021-03-28
    • 1970-01-01
    • 2021-10-16
    • 2022-01-08
    • 1970-01-01
    • 2020-06-14
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多