【问题标题】:Start and end of a period in a data frame数据帧中周期的开始和结束
【发布时间】:2021-06-16 16:48:36
【问题描述】:

我想根据它包含的数据返回数据框的开始和结束的值。如果只有零,我想用 NA 填写开始和结束列。

数据结构:

输出:

样本数据:

structure(list(ID = c(1, 2, 3), A1 = c(1, 1,0), A2 = c(1, 1,0), A3 = c(0, 
1,0), A4 = c(0, 1,0), A5 = c(0, 1,0)), class = c("spec_tbl_df", "tbl_df", 
"tbl", "data.frame"), row.names = c(NA, -3L), spec = structure(list(
    cols = list(ID = structure(list(), class = c("collector_double", 
    "collector")), A1 = structure(list(), class = c("collector_double", 
    "collector")), A2 = structure(list(), class = c("collector_double", 
    "collector")), A3 = structure(list(), class = c("collector_double", 
    "collector")), A4 = structure(list(), class = c("collector_double", 
    "collector")), A5 = structure(list(), class = c("collector_double", 
    "collector"))), default = structure(list(), class = c("collector_guess", 
    "collector")), skip = 1L), class = "col_spec"))

示例代码(不适用于 O 行):

start <- names(df1)[-1][max.col(df1[-1], "first")]
end <- names(df1)[-1][max.col(df1[-1], "last")]
data.frame(ID = df1$ID, start, end)

【问题讨论】:

    标签: r dataframe


    【解决方案1】:

    这行得通吗:

    library(dplyr)
    library(tidyr)
    library(stringr)
    df %>% pivot_longer(-ID) %>% group_by(ID) %>% 
      mutate(s = cumsum(value)) %>% mutate(s = na_if(s,0)) %>% 
        transmute(start = str_c('A',min(s)), end = str_c('A',max(s))) %>% distinct()
    # A tibble: 3 x 3
    # Groups:   ID [3]
         ID start end  
      <dbl> <chr> <chr>
    1     1 A1    A2   
    2     2 A1    A5   
    3     3 NA    NA   
    

    【讨论】:

      【解决方案2】:

      使用基本函数和 for 循环,您可以遍历所有行并注意包含 1 的最低和最高列。但是,它不会注意到连续中的任何中断。如果你的1 连续被0 打断,这不会显示在结果中。

      id = c()
      start = c()
      end = c()
      for(i in 1:dim(df)[1]){
        id = c(id,df$ID[i])
        row = df[i,-1]
        start = c(start,names(row)[min((1:length(row))[row==1])])
        end = c(end,names(row)[max((1:length(row))[row==1])])
      }
      
      out = data.frame(ID=id,
                       start=start,
                       end=end)
      
      

      输出是:

      > out
        ID start  end
      1  1    A1   A2
      2  2    A1   A5
      3  3  <NA> <NA>
      

      【讨论】:

        【解决方案3】:
        
        library(tidyverse)
        df1 %>% group_by(ID) %>% #rowwise() %>%
          summarise(start = list(names(cur_data())[as.logical(cur_data())]),
                 end = unlist(map(start, ~last(.x))),
                 start = unlist(map(start, ~first(.x))),
                 .groups = 'drop')
        
        #> # A tibble: 3 x 3
        #>      ID start end  
        #>   <dbl> <chr> <chr>
        #> 1     1 A1    A2   
        #> 2     2 A1    A5   
        #> 3     3 <NA>  <NA>
        

        reprex package (v2.0.0) 于 2021-06-16 创建

        【讨论】:

          【解决方案4】:

          下面是一个可能有帮助的小程序。但是,如果您确定一行 1 中没有零,这将起作用。您的示例数据和示例代码表明了这一点。

          #your data
          df1 <- structure(list(ID = c(1, 2, 3), A1 = c(1, 1,0), A2 = c(1, 1,0), A3 = c(0, 1,0), A4 = c(0, 1,0), A5 = c(0, 1,0)), class = c("spec_tbl_df", "tbl_df", "tbl", "data.frame"), row.names = c(NA, -3L), spec = structure(list(cols = list(ID = structure(list(), class = c("collector_double", "collector")), A1 = structure(list(), class = c("collector_double",  "collector")), A2 = structure(list(), class = c("collector_double", "collector")), A3 = structure(list(), class = c("collector_double", "collector")), A4 = structure(list(), class = c("collector_double", "collector")), A5 = structure(list(), class = c("collector_double", "collector"))), default = structure(list(), class = c("collector_guess", "collector")), skip = 1L), class = "col_spec"))
          
          #use the library data.table
          library(data.table)
          df1 <- data.table(din)
          
          #make a sum of by ID (by row)
          df1[,sumUSE:=sum(.SD), by=ID]
          
          #last
          df1[,end:=names(df1)[(df1[,sumUSE]+1)]]
          df1[end=="ID", end:=NA]
          
          #first
          df1[,start:=names(df1)[2]]
          df1[is.na(end), start:=NA]
          
          print(df1)
          #   ID A1 A2 A3 A4 A5 sumUSE  end start
          #1:  1  1  1  0  0  0      2   A2    A1
          #2:  2  1  1  1  1  1      5   A5    A1
          #3:  3  0  0  0  0  0      0 <NA>  <NA>
          

          【讨论】:

            猜你喜欢
            • 1970-01-01
            • 2023-03-30
            • 2010-11-19
            • 1970-01-01
            • 1970-01-01
            • 1970-01-01
            • 2012-07-25
            • 1970-01-01
            相关资源
            最近更新 更多