【发布时间】:2020-09-21 22:56:56
【问题描述】:
我正在尝试复制观测值集群 (ID) 并生成一个新变量来标识 唯一的集群(new_ID)。例如,考虑数据框 df1
df1 <- data.frame(ID=c("1", "1", "1", "2", "2", "3"), sex=c("M", "M", "M", "F", "F", "M"),count=c(4,4,4,3,3,2))
df1
#> ID sex count
#> 1 1 M 4
#> 2 1 M 4
#> 3 1 M 4
#> 4 2 F 3
#> 5 2 F 3
#> 6 3 M 2
df2 <- data.frame(
ID=c("1","1","1","1","1","1","1","1","1","1","1","1","2","2","2","2","2","2","3","3"),
new_ID = c("1","1","1","2","2","2","3","3","3","4","4","4","5","5","6","6","7","7", "8","9"),
sex=c("M","M","M","M","M","M","M","M","M","M","M","M", "F", "F", "F", "F","F", "F","M","M"),
count=c(4,4,4,4,4,4,4,4,4,4,4,4,3,3,3,3,3,3,2,2))
df2
#> ID new_ID sex count
#> 1 1 1 M 4
#> 2 1 1 M 4
#> 3 1 1 M 4
#> 4 1 2 M 4
#> 5 1 2 M 4
#> 6 1 2 M 4
#> 7 1 3 M 4
#> 8 1 3 M 4
#> 9 1 3 M 4
#> 10 1 4 M 4
#> 11 1 4 M 4
#> 12 1 4 M 4
#> 13 2 5 F 3
#> 14 2 5 F 3
#> 15 2 6 F 3
#> 16 2 6 F 3
#> 17 2 7 F 3
#> 18 2 7 F 3
#> 19 3 8 M 2
#> 20 3 9 M 2
感谢您提前提供帮助。
【问题讨论】:
-
集群是如何定义的?
-
它们由 ID 定义
-
如何获得前 3 个 1,然后是 3 个 2,等等?
-
前3个1是因为1在df1中重复的次数是3。所以df2中的1、2的3和4都重复了3次,因为它们都关联到ID = 1在 df1
-
可能是我,但我仍然不明白您如何根据
df1的值计算df2$new_id...
标签: r dplyr count tidyr unnest