【问题标题】:Lag in dataframe数据帧滞后
【发布时间】:2014-05-12 08:11:57
【问题描述】:

我有一个类似的数据框

  ID_CASE   Month   
CS00000026A 201301  
CS00000026A 201302  
CS00000026A 201303  
CS00000026A 201304  
CS00000026A 201305  
CS00000026A 201306  
CS00000026A 201307  
CS00000026A 201308  
CS00000026A 201309  
CS00000026A 201310  
CS00000191C 201302  
CS00000191C 201303  
CS00000191C 201304  
CS00000191C 201305  
CS00000191C 201306  
CS00000191C 201307  
CS00000191C 201308  
CS00000191C 201309  
CS00000191C 201310  

我希望最终的数据框有三个额外的列,例如

  ID_CASE   Month   Lag_1   Lag_2   Lag_3
CS00000026A 201301  NA      NA      NA
CS00000026A 201302  201301  NA      NA
CS00000026A 201303  201202  201201  NA
CS00000026A 201304  201203  201202  201201
CS00000026A 201305  201204  201203  201202
CS00000026A 201306  201305  201304  201303
CS00000026A 201307  201306  201305  201304
CS00000026A 201308  201307  201306  201305
CS00000026A 201309  201308  201307  201306
CS00000026A 201310  201309  201308  201307
CS00000191C 201302  NA       NA     NA
CS00000191C 201303  201302   NA     NA
CS00000191C 201304  201303  201302      NA
CS00000191C 201305  201304  201303  201302
CS00000191C 201306  201305  201304  201303
CS00000191C 201307  201306  201305  201304
CS00000191C 201308  201307  201306  201305
CS00000191C 201309  201308  201307  201306
CS00000191C 201310  201309  201308  201307

在哪里

  • Lag_1 滞后 1 个月
  • Lag_2 滞后 2 个月
  • Lag_3 滞后 3 个月。

我使用以下代码至少得到了 Lag_1

df <- ddply(df,.(ID_CASE),transform,
                  Lag_1 <- c(NA,Month[-nrow(df)])) 

但这并没有给我 Lag_1 所需的输出。

我也尝试过查看解决方案 Lag in R dataframe

如果我有一个 date 对象而不是当前示例中的 int 列 'Month' 怎么办?

我们将不胜感激。

【问题讨论】:

    标签: r dataframe plyr lag


    【解决方案1】:

    试试data.table

    library(data.table)
    setDT(df)[, `:=` (Lag_1 = c(NA, Month[-.N]),
                      Lag_2 = c(rep(NA, 2), Month[-.N]),
                      Lag_3 = c(rep(NA, 3), Month[-.N])), by = ID_CASE]
    df
    #         ID_CASE  Month  Lag_1  Lag_2  Lag_3
    #  1: CS00000026A 201301     NA     NA     NA
    #  2: CS00000026A 201302 201301     NA     NA
    #  3: CS00000026A 201303 201302 201301     NA
    #  4: CS00000026A 201304 201303 201302 201301
    #  5: CS00000026A 201305 201304 201303 201302
    #  6: CS00000026A 201306 201305 201304 201303
    #  7: CS00000026A 201307 201306 201305 201304
    #  8: CS00000026A 201308 201307 201306 201305
    #  9: CS00000026A 201309 201308 201307 201306
    # 10: CS00000026A 201310 201309 201308 201307
    # 11: CS00000191C 201302     NA     NA     NA
    # 12: CS00000191C 201303 201302     NA     NA
    # 13: CS00000191C 201304 201303 201302     NA
    # 14: CS00000191C 201305 201304 201303 201302
    # 15: CS00000191C 201306 201305 201304 201303
    # 16: CS00000191C 201307 201306 201305 201304
    # 17: CS00000191C 201308 201307 201306 201305
    # 18: CS00000191C 201309 201308 201307 201306
    # 19: CS00000191C 201310 201309 201308 201307
    

    【讨论】:

    • 有没有办法将 data.table 转换为 data.frame?问的原因是我在尝试将 data.table(在本例中为 df)与 data.frame 合并时遇到了一些问题
    • @darkage df &lt;- as.data.frame(df)
    【解决方案2】:

    来自data.tablev1.9.6你可以使用shift()

    require(data.table)
    setDT(df)[, paste("lag", 1:3, sep="_") := shift(Month, 1:3), by=ID_CASE]
    

    【讨论】:

      【解决方案3】:

      您可以使用lag.zoo,其中k 可以是滞后向量。

      library(plyr)
      library(zoo)
      
      ddply(df, .(ID_CASE), function(x){
        z <- zoo(x$Month)
        lag(z, k = 0:-3)
      })
      
      #        ID_CASE   lag0  lag-1  lag-2  lag-3
      # 1  CS00000026A 201301     NA     NA     NA
      # 2  CS00000026A 201302 201301     NA     NA
      # 3  CS00000026A 201303 201302 201301     NA
      # 4  CS00000026A 201304 201303 201302 201301
      # 5  CS00000026A 201305 201304 201303 201302
      # 6  CS00000026A 201306 201305 201304 201303
      # 7  CS00000026A 201307 201306 201305 201304
      # 8  CS00000026A 201308 201307 201306 201305
      # 9  CS00000026A 201309 201308 201307 201306
      # 10 CS00000026A 201310 201309 201308 201307
      # 11 CS00000191C 201302     NA     NA     NA
      # 12 CS00000191C 201303 201302     NA     NA
      # 13 CS00000191C 201304 201303 201302     NA
      # 14 CS00000191C 201305 201304 201303 201302
      # 15 CS00000191C 201306 201305 201304 201303
      # 16 CS00000191C 201307 201306 201305 201304
      # 17 CS00000191C 201308 201307 201306 201305
      # 18 CS00000191C 201309 201308 201307 201306
      # 19 CS00000191C 201310 201309 201308 201307
      

      编辑以下评论。

      如果有一组只有一个日期,上面的代码会产生错误。一个小例子:

      df <- data.frame(ID_CASE = c(1, 1, 1, 2), Month = 1:4)
      df
      #   ID_CASE Month
      # 1       1     1
      # 2       1     2
      # 3       1     3
      # 4       2     4
      
      ddply(df, .(ID_CASE), function(x){
        z <- zoo(x$Month)
        lag(z, k = 0:-3)
      })
      
      # Error in list_to_dataframe(res, attr(.data, "split_labels"), .id, id_as_factor) : 
      #   Results do not have equal lengths
      

      这是因为“仅注册组”被强制使用单变量时间序列。为避免这种强制,请使用[ 子集和drop = FALSE

      ddply(df, .(ID_CASE), function(x){
        z <- zoo(x[ , "Month", drop = FALSE])
        lag(z, k = 0:-3)
      })
      
      #   ID_CASE Month.lag0 Month.lag-1 Month.lag-2 Month.lag-3
      # 1       1          1          NA          NA          NA
      # 2       1          2           1          NA          NA
      # 3       1          3           2           1          NA
      # 4       2          4          NA          NA          NA
      

      【讨论】:

      • 这似乎与问题中所需输出的结果不完全相同。查看第 11 行中 lag1 的差异
      • @beginneR,感谢您的评论!好吧,那么我似乎误解了滞后结果的基本逻辑。
      • @beginneR 输出很好,因为对应于 ID_CASE="CS00000191C" 在 201302 之前没有其他月份
      • @darkage 好吧,我认为这取决于 OP 实际想要实现的目标。我想他可能想得到上个月的数据,即使它之前不在数据中。至少这是他在问题中提出的内容
      • @Henrik 请对问题进行必要的编辑
      【解决方案4】:

      使用 dplyr:

      library(dplyr)
      
       df %.%
        group_by(ID_CASE) %.%
        mutate(lag_1 = lag(Month, 1),
               lag_2 = lag(Month, 2),
               lag_3 = lag(Month, 3))
      

      【讨论】:

        猜你喜欢
        • 2018-05-18
        • 1970-01-01
        • 2011-04-03
        • 2018-09-01
        • 1970-01-01
        • 2021-07-26
        相关资源
        最近更新 更多