如何在R中对数据框中的子组进行编号[重复]答案

【问题标题】：how to number subgroups in a dataframe in R [duplicate]如何在R中对数据框中的子组进行编号[重复]
【发布时间】：2015-05-03 14:00:00
【问题描述】：

我正在努力为数据框中的子组编号。

我们以iris 数据集为例。假设iris$Species 标识了我的子组（所以我有三个子组：setosa、versicolor、virginica）。现在我想在 iris 中添加另一列，假设观察数：iris$Obs。对于每个子组，我希望有数字，从 1 到子组的长度，当子组更改时重置为 1。

换句话说，只要“Species”不变，我希望“Obs”在“Species”更改并加一时从 1 开始。

我准备了一张图片，但作为一个完全的菜鸟，我没有声望点可以将其粘贴在这里...

感谢大家的帮助！

编辑：> dput(iris)

structure(list(Species = structure(c(1L, 1L, 1L, 2L, 2L, 3L), .Label = c("setosa", 
"versicolor", "virginica"), class = "factor")), .Names = "Species", row.names = c(NA, 
-6L), class = "data.frame")

【问题讨论】：

标签： r

【解决方案1】：

1) ave 试试ave：

transform(iris, Obs = ave(c(Species), Species, FUN = seq_along))

ave 的第一个参数可以是任何值，只要它是每行一个元素的数值向量即可。例如，我们可以使用1:nrow(iris)、numeric(nrow(iris)) 或Sepal.Length。在这种情况下，Species 是 "factor"，c(Species) 是 "numeric"。每组的行不必是连续的。

2) 匹配 另一种可能性是，从每个组的序列号中减去第一次出现的位置并加 1：

transform(iris, Obs = seq_along(Species) - match(Species, Species) + 1)

如果省略 +1，它将给出从 0 而不是 1 开始的数字。此解决方案要求每个组的行是连续的。

【讨论】：

【解决方案2】：

你可以试试我的“splitstackshape”包中的getanID。

根据您的描述，代码是：

getanID(iris, "Species")

小组变化的样子：

getanID(iris, "Species")[45:55]
#     Sepal.Length Sepal.Width Petal.Length Petal.Width    Species .id
#  1:          5.1         3.8          1.9         0.4     setosa  45
#  2:          4.8         3.0          1.4         0.3     setosa  46
#  3:          5.1         3.8          1.6         0.2     setosa  47
#  4:          4.6         3.2          1.4         0.2     setosa  48
#  5:          5.3         3.7          1.5         0.2     setosa  49
#  6:          5.0         3.3          1.4         0.2     setosa  50
#  7:          7.0         3.2          4.7         1.4 versicolor   1
#  8:          6.4         3.2          4.5         1.5 versicolor   2
#  9:          6.9         3.1          4.9         1.5 versicolor   3
# 10:          5.5         2.3          4.0         1.3 versicolor   4
# 11:          6.5         2.8          4.6         1.5 versicolor   5

在引擎盖下，它基本上是这样的：

library(data.table)
as.data.table(iris)[, ID := sequence(.N), by = Species]

或者如果你更喜欢“dplyr”：

iris %>%
  group_by(Species) %>%
  mutate(ID = sequence(n()))

【讨论】：

没试过，但可能是row_number() 为dplyr。也感觉像一个巨大的骗子