查找向量中相同字符串的组/区域的第一个和最后一个索引答案

【问题标题】：Find the first and last index of a group/zone of identical character strings in a vector查找向量中相同字符串的组/区域的第一个和最后一个索引
【发布时间】：2016-09-12 09:00:49
【问题描述】：

b <- c("true", "true", "true", "true", "true", "false", "false", "true","true", "true", "false", "false", "false","true", "true", "false", "true", "false", "true", "false")

我正在尝试编写一个函数，该函数将上述向量作为输入，并在每个“区域”（区域被定义为有两个或多个连续相同元素的子向量）。上述所需的输出将是一个数据框，例如：

x   |  y
----|----
1   |  5
8   |  10
14  |  15

我已经成功编写了一个函数（如下），但我的 Shiny 应用程序花费的时间太长了。如果有一种更清洁、更快捷的方法，那就太好了。

zone_identifier <- function(dataframe, zone_source_col_index, match_string){                    
zones_df <- data.frame()
zone_source_vector <- data.frame[,zone_source_col_index]

for(i in 1:(length(zone_source_vector)-1){
zone_component_recorder <-vector()
for(j in 1:(length(zone_source_vector)-i)){
if(zone_source_vector[i]==match_string && zone_source_vector[i+j]==match_string){ if(i>1 && zone_source_vector[i-1]==match_string{
break}

zone_component_recorder <-c(i, i+j)
else if(zone_source_vector[i]==match_string && zone_source_vector[i+j]!=match_string){break}
zones_df <- rbind.data.frame(zones_df, zone_component_recorder)}
return(zones_df)
}

【问题讨论】：

这里问了一个类似的问题，这能满足您的需求吗？ stackoverflow.com/questions/35227312/…

标签： r

【解决方案1】：

您可以使用rle 寻找解决方案

#use rle to find runs of same value in b
rle_res=rle(b)
#find starting position of each true and false
start_vec=c(1,1+cumsum(rle_res$lengths))
start_vec=start_vec[-length(start_vec)]
#same for end position
end_vec=c(start_vec[-1]-1,NA_integer_)

#filter on true values
data.frame(x=start_vec[rle_res$values=="true"],
           y=end_vec[rle_res$values=="true"])
#   x  y
#1  1  5
#2  8 10
#3 14 15
#4 17 17
#5 19 19

【讨论】：

【解决方案2】：

这是一个使用data.table的选项

library(data.table)
v1 <- data.table(b)[, {
      i1 <- .I[b=="true" & seq_len(.N) %in% c(1,.N)]
      if(.N==1) rep(i1, 2) else i1} , by =  rleid(b)]$V1
data.table(x= v1[c(TRUE, FALSE)], y = v1[c(FALSE, TRUE)])
#    x  y
#1:  1  5
#2:  8 10
#3: 14 15
#4: 17 17
#5: 19 19

【讨论】：

【解决方案3】：

使用 dplyr 的解决方案

library(dplyr)

run <- rle(b)$lengths
data.frame( type= rle(b)$values , 
            x =c(1,cumsum(run)[-length(run)]+1 ) ,
            y =cumsum(run) ) %>% 
  filter(type=="true") %>%
  select(-type)

【讨论】：