这是另一种使用 base R 的解决方案。我试图很好地评论它,但它可能仍然难以理解。似乎您想要指导/学习,而不是一个直接的答案,因此如果有任何不清楚的地方(或不适用于您的实际应用程序),请务必跟进问题。
另外,对于您的数据,我在末尾添加了一个 12,以确保它返回正确的位置以重复增加大于 n(在本例中为 3):
# Data (I added 11 on the end)
sequence <- c(1,2,3,2,5,3,2,6,7,9, 12)
# Create indices for whether or not the numbers in the sequence increased
indices <- c(1, diff(sequence) >= 1)
indices
[1] 1 1 1 0 1 0 0 1 1 1 1
现在我们有了索引,我们需要获取重复 >= 3 的开始和结束位置
# Finding increasing sequences of n length using rle
n <- 3
n <- n - 1
# Examples
rle(indices)$lengths
[1] 3 1 1 2 4
rle(indices)$values
[1] 1 0 1 0 1
# Finding repeated TRUE (1) in our indices vector
reps <- rle(indices)$lengths >= n & rle(indices)$values == 1
reps
[1] TRUE FALSE FALSE FALSE TRUE
# Creating a vector of positions for the end of a sequence
# Because our indices are true false, we can use cumsum along
# with rle to create the positions of the end of the sequences
rle_positions <- cumsum(rle(indices)$lengths)
rle_positions
[1] 3 4 5 7 11
# Creating start sequence vector and subsetting start / end using reps
start <- c(1, head(rle_positions, -1))[reps]
end <- rle_positions[reps]
data.frame(start, end)
start end
1 1 3
2 7 11
或者,简而言之:
n <- 3
n <- n-1
indices <- c(1, diff(sequence) >= 1)
reps <- rle(indices)$lengths >= n & rle(indices)$values == 1
rle_positions <- cumsum(rle(indices)$lengths)
data.frame(start = c(1, head(rle_positions, -1))[reps],
end = rle_positions[reps])
start end
1 1 3
2 7 11
编辑:@Ronak 的更新让我意识到我应该在第一步中使用带有匿名函数的 diff 而不是 sapply。更新了答案 b/c,它在向量末尾没有增加(例如,sequence <- c(1,2,3,2,5,3,2,6,7,9,12, 11, 11, 20, 100),还需要在n <- 3 下再添加一行。现在应该可以按预期工作了。