【发布时间】:2017-11-07 20:32:08
【问题描述】:
我在 R 中有一个 data.table,用于跟踪系统内项目的移动。我想根据ID 和Location 这两个字段对这些数据进行分组。
library(data.table)
example <- data.table(ID = rep(LETTERS[1:3], each = 6),
Location = c(1,2,3,1,2,1,2,2,2,3,3,1,2,3,3,3,1,3))
example
# ID Location
# 1: A 1
# 2: A 2
# 3: A 3
# 4: A 1
# 5: A 2
# 6: A 1
# 7: B 2
# 8: B 2
# 9: B 2
# 10: B 3
# 11: B 3
# 12: B 1
# 13: C 2
# 14: C 3
# 15: C 3
# 16: C 3
# 17: C 1
# 18: C 3
我希望输出的是一个新列,其编号在每次位置更改时递增,无论新位置是什么(即该位置是否存在于历史记录中的其他位置)。与 this question 相反,它只在组内递增。
expected_output <- data.table(ID = rep(LETTERS[1:3], each = 6),
Location = c(1,2,3,1,2,1,2,2,2,3,3,1,2,3,3,3,1,3),
Group = c(1,2,3,4,5,6,1,1,1,2,2,3,1,2,2,2,3,4))
expected_output
# ID Location Group
# 1: A 1 1
# 2: A 2 2
# 3: A 3 3
# 4: A 1 4
# 5: A 2 5
# 6: A 1 6
# 7: B 2 1
# 8: B 2 1
# 9: B 2 1
# 10: B 3 2
# 11: B 3 2
# 12: B 1 3
# 13: C 2 1
# 14: C 3 2
# 15: C 3 2
# 16: C 3 2
# 17: C 1 3
# 18: C 3 4
到目前为止,我已经尝试了几种不同的 by 参数组合,但运气不佳。我似乎能够得到的最接近的是diff,它部分正确地显示了更改发生的时间,但在内部递增。
output <- example
output[, Group := 1:.N, by = paste0(ID, Location, diff(Location))]
output
# ID Location Group
# 1: A 1 1
# 2: A 2 1 # not incrementing/new group
# 3: A 3 1 # not incrementing/new group
# 4: A 1 2
# 5: A 2 1
# 6: A 1 3
# 7: B 2 1
# 8: B 2 2 # incrementing when shouldn't
# 9: B 2 1
# 10: B 3 1
# 11: B 3 1
# 12: B 1 1
# 13: C 2 1
# 14: C 3 1
# 15: C 3 2
# 16: C 3 1
# 17: C 1 1
# 18: C 3 1
在这一点上,我很迷茫,尽管我确信解决方案正盯着我的脸。
【问题讨论】:
标签: r data.table grouping