【问题标题】:In R: How can I check that I have consecutive years of data (to later be able to calculate growth)?在 R 中:我如何检查我是否有连续多年的数据(以便以后能够计算增长)?
【发布时间】:2025-12-02 20:25:01
【问题描述】:

我有下面的数据框(示例):

companyID   year   yearID
    1       2010     1
    1       2011     2
    1       2012     3
    1       2013     4
    2       2010     1
    2       2011     2
    2       2016     3
    2       2017     4
    2       2018     5
    3       2010     1
    3       2011     2
    3       2014     3
    3       2017     4
    3       2018     5

我使用了一个 for 循环来尝试创建一个序列列,该列为每个新的数字序列开始一个新的数字。我是 R 新手,所以我的定义可能有点错误。我的 for 循环如下所示:

size1 <- c(1:3)
s <- 0
for (val1 in size) {
  m <- max(sample[sample$companyID == val1, 4])
  size2 <- c(1:m)
  for (val2 in size2){ 
    row <- sample[which(sample$companyID == val1 & sample$yearID == val2)]
    m1 <- sample[sample$companyID == val1 & sample$yearID == val2, 2]
    m2 <- sample[sample$CompanyID == val1 & sample$yearID == (val2-1), 2]
    if(val2>1 && m1-m2 > 1) {
                  sample$sequence[row] s = s+1}
    else {s = s}
  }
  }

其中 m 是每个 companyID 的 yearID 的最大值,row 是标识应该在 companyID = val1 和 yearID = val2 的行上输入该值,m1 来自 year 变量并且是后一年,而m2 是前一年。我试图做的是每次 m1-m2 > 1 (当 val2 > 1 时)改变序列。

期望的结果:

companyID   year   yearID   sequence
    1       2010     1          1
    1       2011     2          1
    1       2012     3          1
    1       2013     4          1
    2       2010     1          2
    2       2011     2          2
    2       2016     3          3
    2       2017     4          3
    2       2018     5          3
    3       2010     1          4
    3       2011     2          4
    3       2014     3          5
    3       2017     4          6
    3       2018     5          6

如果有人能提供帮助,非常感谢!

【问题讨论】:

    标签: r for-loop if-statement


    【解决方案1】:

    这是个好问题!

    1. 首先group_bycompanyID
    2. lag 计算year 列中每一连续行的差,以确定年份是否连续。
    3. group_bycompanyID, yearID)
    4. mutate helper column sequence1 将 1 应用于组中每个开始的连续年份。
    5. ungroup 并每次应用一个序列号 1 发生在sequence1
    6. 删除列sequence1deltalag1
    library(tidyverse)
    
    df1 <- df %>% 
      group_by(companyID) %>% 
      mutate(deltaLag1 = year - lag(year, 1)) %>% 
      group_by(companyID, yearID) %>% 
      mutate(sequence1 = case_when(is.na(deltaLag1) | deltaLag1 > 1 ~ 1,
                                   TRUE ~ 2)) %>% 
      ungroup() %>% 
      mutate(sequence = cumsum(sequence1==1)) %>% 
      select(-deltaLag1, -sequence1)
    

    数据

    df <- tribble(
    ~companyID,   ~year,   ~yearID,
    1, 2010, 1, 
    1, 2011, 2, 
    1, 2012, 3, 
    1, 2013, 4, 
    2, 2010, 1, 
    2, 2011, 2, 
    2, 2016, 3, 
    2, 2017, 4, 
    2, 2018, 5, 
    3, 2010, 1, 
    3, 2011, 2, 
    3, 2014, 3, 
    3, 2017, 4, 
    3, 2018, 5)
    

    【讨论】:

      【解决方案2】:

      不清楚您是否想要确切的desired outcomecheck that you have consecutive years by companyID

      根据你的标题信息:

      sample <- read.table(header = TRUE, text = "
      companyID   year   yearID
          1       2010     1
          1       2011     2
          1       2012     3
          1       2013     4
          2       2010     1
          2       2011     2
          2       2016     3
          2       2017     4
          2       2018     5
          3       2010     1
          3       2011     2
          3       2014     3
          3       2017     4
          3       2018     5
      ")
      
      library(data.table)
      sample <- setDT(sample)
      sample[ , diff_year := year - shift(year), by = companyID]    
      sample <- setDF(sample)
      sample
      #>    companyID year yearID diff_year
      #> 1          1 2010      1        NA
      #> 2          1 2011      2         1
      #> 3          1 2012      3         1
      #> 4          1 2013      4         1
      #> 5          2 2010      1        NA
      #> 6          2 2011      2         1
      #> 7          2 2016      3         5
      #> 8          2 2017      4         1
      #> 9          2 2018      5         1
      #> 10         3 2010      1        NA
      #> 11         3 2011      2         1
      #> 12         3 2014      3         3
      #> 13         3 2017      4         3
      #> 14         3 2018      5         1
      
      # Created on 2021-03-13 by the reprex package (v1.0.0.9002)
      

      Calculate difference between values in consecutive rows by group相关

      问候,

      【讨论】: