【问题标题】:SImplify code to group in r简化代码以在 r 中分组
【发布时间】:2015-03-17 15:47:52
【问题描述】:

我有一个这样的数据框:

ID PA   WA  PC
1   2   -6   8 
2   2   -2   7
3   3    7   2
4  -3    3  -6
5   3   20  12
6  15  -17  18
7   3    6  10

我尝试根据他们在 PA、WA 和 PC 中的分数对 ID 进行分组。

这个我已经用过了,但是太麻烦了:

NEW1 <- subset(WA.PC.PA, PA< -5 & WA < -5 & PC> 5, select=c(id, PA, WA, PC))
NEW2 <- subset(WA.PC.PA, PA >5 & WA < -5 & PC< -5, select=c(id, PA, WA, PC))
NEW3 <- subset(WA.PC.PA, PA < -5 & WA >5 & PC< -5, select=c(id, PA, WA, PC))
NEW4 <- subset(WA.PC.PA, PA < -5 & WA < -5 & PC< -5, select=c(id, PA, WA, PC))
NEW5 <- subset(WA.PC.PA, PA > 5 & WA >5 & PC< -5, select=c(id, PA, WA, PC))
NEW6 <- subset(WA.PC.PA, PA >5 & WA < -5 & PC>5, select=c(id, PA, WA, PC))
NEW7 <- subset(WA.PC.PA, PA < -5 & WA >5 & PC>5, select=c(id, PA, WA, PC))
NEW8 <- subset(WA.PC.PA, PA >5 & WA >5 & PC>5, select=c(id, PA, WA, PC))
NEW9 <- subset(WA.PC.PA, PA<5 & PA>-5 & WA<5 & WA>-5 & PC<5 & PC>-5, select=c(id, PA, WA, PC))
NEW10 <- subset(WA.PC.PA, PA < -5 & WA < -5 & PC<5 & PC>-5, select=c(id, PA, WA, PC))
NEW11 <- subset(WA.PC.PA, PA<5 & PA>-5 & WA < -5 & PC<5 & PC>-5, select=c(id, PA, WA, PC))
NEW12 <- subset(WA.PC.PA, PA<5 & PA>-5 & WA<5 & WA>-5 & PC< -5, select=c(id, PA, WA, PC))
NEW13 <- subset(WA.PC.PA, PA< -5 & WA<5 & WA>-5 & PC<5 & PC>-5, select=c(id, PA, WA, PC))
NEW14 <- subset(WA.PC.PA, PA < -5 & WA<5 & WA>-5 & PC< -5, select=c(id, PA, WA, PC))
NEW15 <- subset(WA.PC.PA, PA< -5 & WA<5 & WA>-5 & PC>5, select=c(id, PA, WA, PC))
NEW16 <- subset(WA.PC.PA, PA < -5 & WA >5 & PC<5 & PC>-5, select=c(id, PA, WA, PC))
NEW17 <- subset(WA.PC.PA, PA<5 & PA>-5 & WA < -5 & PC>5, select=c(id, PA, WA, PC))
NEW18 <- subset(WA.PC.PA, PA<5 & PA>-5 & WA<5 & WA>-5 & PC< -5, select=c(id, PA, WA, PC))
NEW19 <- subset(WA.PC.PA, PA<5 & PA>-5 & WA<5 & WA>-5 & PC>5, select=c(id, PA, WA, PC))
NEW20 <- subset(WA.PC.PA, PA<5 & PA>-5 & WA >5 & PC< -5, select=c(id, PA, WA, PC))
NEW21 <- subset(WA.PC.PA, PA<5 & PA>-5 & WA >5 & PC<5 & PC>-5, select=c(id, PA, WA, PC))
NEW22 <- subset(WA.PC.PA, PA<5 & PA>-5 & WA >5 & PC>5, select=c(id, PA, WA, PC))
NEW23 <- subset(WA.PC.PA, PA >5 & WA < -5 & PC<5 & PC>-5, select=c(id, PA, WA, PC))
NEW24 <- subset(WA.PC.PA, PA >5 & WA<5 & WA>-5 & PC< -5, select=c(id, PA, WA, PC))
NEW25 <- subset(WA.PC.PA, PA >5 & WA<5 & WA>-5 & PC<5 & PC>-5, select=c(id, PA, WA, PC))
NEW26 <- subset(WA.PC.PA, PA >5 & WA<5 & WA>-5 & PC>5, select=c(id, PA, WA, PC))
NEW27 <- subset(WA.PC.PA, PA >5 & WA >5 & PC<5 & PC>-5, select=c(id, PA, WA, PC))

如您所见,我将每个分数分为三个等级,5 之间。但我想 1) 简化代码,因为当我想为每个测试的分数分配不同的数字时,我需要重写整个代码。

我该怎么做?

【问题讨论】:

  • 有一个findInterval 函数。在使用它为每个变量定义分类变量之后,您可以使用它们的组标签标记观察结果。这对于确保您的子集详尽且独特(覆盖原始集中的所有内容而没有重叠)是最好的。在 data.table 中,您可以使用 DT[,groupname:=.GRP,by=list(PAcode,WAcode,PCcode)] 标记组

标签: r simplify


【解决方案1】:

执行此操作的最佳方法是使用剪切,然后是交互,然后是拆分。基本上,cut 将定义每个变量的分区,例如,

paCuts = with(WA.PC.PA, cut(PA, c(-Inf, -5, 5, Inf)))
waCuts = with(WA.PC.PA, cut(PA, c(-Inf, -5, 5, Inf)))
levels = interaction(paCuts, waCuts)
split(WA.PC.PA, levels)

这里的好处是您可以将分区视为数据,即在向量中,而不是代码;在条件语句中。它使更改各种剪辑变得轻而易举。

【讨论】:

  • 谢谢! @jimmyb,这很棒。但是如何导出 id,因为 split(WA.PC.PA, levels) 会生成一个列表。
  • 你可以拆分任何东西,包括IDS,例如split(WA.PC.PA$IDS, levels)
猜你喜欢
  • 1970-01-01
  • 1970-01-01
  • 2023-01-22
  • 1970-01-01
  • 2013-02-09
  • 1970-01-01
  • 2021-07-30
  • 2018-10-24
  • 1970-01-01
相关资源
最近更新 更多