【发布时间】:2015-03-17 15:47:52
【问题描述】:
我有一个这样的数据框:
ID PA WA PC
1 2 -6 8
2 2 -2 7
3 3 7 2
4 -3 3 -6
5 3 20 12
6 15 -17 18
7 3 6 10
我尝试根据他们在 PA、WA 和 PC 中的分数对 ID 进行分组。
这个我已经用过了,但是太麻烦了:
NEW1 <- subset(WA.PC.PA, PA< -5 & WA < -5 & PC> 5, select=c(id, PA, WA, PC))
NEW2 <- subset(WA.PC.PA, PA >5 & WA < -5 & PC< -5, select=c(id, PA, WA, PC))
NEW3 <- subset(WA.PC.PA, PA < -5 & WA >5 & PC< -5, select=c(id, PA, WA, PC))
NEW4 <- subset(WA.PC.PA, PA < -5 & WA < -5 & PC< -5, select=c(id, PA, WA, PC))
NEW5 <- subset(WA.PC.PA, PA > 5 & WA >5 & PC< -5, select=c(id, PA, WA, PC))
NEW6 <- subset(WA.PC.PA, PA >5 & WA < -5 & PC>5, select=c(id, PA, WA, PC))
NEW7 <- subset(WA.PC.PA, PA < -5 & WA >5 & PC>5, select=c(id, PA, WA, PC))
NEW8 <- subset(WA.PC.PA, PA >5 & WA >5 & PC>5, select=c(id, PA, WA, PC))
NEW9 <- subset(WA.PC.PA, PA<5 & PA>-5 & WA<5 & WA>-5 & PC<5 & PC>-5, select=c(id, PA, WA, PC))
NEW10 <- subset(WA.PC.PA, PA < -5 & WA < -5 & PC<5 & PC>-5, select=c(id, PA, WA, PC))
NEW11 <- subset(WA.PC.PA, PA<5 & PA>-5 & WA < -5 & PC<5 & PC>-5, select=c(id, PA, WA, PC))
NEW12 <- subset(WA.PC.PA, PA<5 & PA>-5 & WA<5 & WA>-5 & PC< -5, select=c(id, PA, WA, PC))
NEW13 <- subset(WA.PC.PA, PA< -5 & WA<5 & WA>-5 & PC<5 & PC>-5, select=c(id, PA, WA, PC))
NEW14 <- subset(WA.PC.PA, PA < -5 & WA<5 & WA>-5 & PC< -5, select=c(id, PA, WA, PC))
NEW15 <- subset(WA.PC.PA, PA< -5 & WA<5 & WA>-5 & PC>5, select=c(id, PA, WA, PC))
NEW16 <- subset(WA.PC.PA, PA < -5 & WA >5 & PC<5 & PC>-5, select=c(id, PA, WA, PC))
NEW17 <- subset(WA.PC.PA, PA<5 & PA>-5 & WA < -5 & PC>5, select=c(id, PA, WA, PC))
NEW18 <- subset(WA.PC.PA, PA<5 & PA>-5 & WA<5 & WA>-5 & PC< -5, select=c(id, PA, WA, PC))
NEW19 <- subset(WA.PC.PA, PA<5 & PA>-5 & WA<5 & WA>-5 & PC>5, select=c(id, PA, WA, PC))
NEW20 <- subset(WA.PC.PA, PA<5 & PA>-5 & WA >5 & PC< -5, select=c(id, PA, WA, PC))
NEW21 <- subset(WA.PC.PA, PA<5 & PA>-5 & WA >5 & PC<5 & PC>-5, select=c(id, PA, WA, PC))
NEW22 <- subset(WA.PC.PA, PA<5 & PA>-5 & WA >5 & PC>5, select=c(id, PA, WA, PC))
NEW23 <- subset(WA.PC.PA, PA >5 & WA < -5 & PC<5 & PC>-5, select=c(id, PA, WA, PC))
NEW24 <- subset(WA.PC.PA, PA >5 & WA<5 & WA>-5 & PC< -5, select=c(id, PA, WA, PC))
NEW25 <- subset(WA.PC.PA, PA >5 & WA<5 & WA>-5 & PC<5 & PC>-5, select=c(id, PA, WA, PC))
NEW26 <- subset(WA.PC.PA, PA >5 & WA<5 & WA>-5 & PC>5, select=c(id, PA, WA, PC))
NEW27 <- subset(WA.PC.PA, PA >5 & WA >5 & PC<5 & PC>-5, select=c(id, PA, WA, PC))
如您所见,我将每个分数分为三个等级,5 之间。但我想 1) 简化代码,因为当我想为每个测试的分数分配不同的数字时,我需要重写整个代码。
我该怎么做?
【问题讨论】:
-
有一个
findInterval函数。在使用它为每个变量定义分类变量之后,您可以使用它们的组标签标记观察结果。这对于确保您的子集详尽且独特(覆盖原始集中的所有内容而没有重叠)是最好的。在 data.table 中,您可以使用DT[,groupname:=.GRP,by=list(PAcode,WAcode,PCcode)]标记组