【问题标题】:Add missing months for a range of date in R在 R 中为日期范围添加缺失的月份
【发布时间】:2020-04-10 03:57:15
【问题描述】:

假设我有一个data.frame如下,每个月都有一个数据条目:

 df <- read.table(text="date,gmsl
2009-01-17,58.4         
2009-02-17,59.1         
2009-04-16,60.9         
2009-06-16,62.3         
2009-09-16,64.6         
2009-12-16,68.3",sep=",",header=TRUE)

##  > df
##         date gmsl
## 1 2009-01-17 58.4
## 2 2009-02-17 59.1
## 3 2009-04-16 60.9
## 4 2009-06-16 62.3
## 5 2009-09-16 64.6
## 6 2009-12-16 68.3

只是想知道我如何在2009-012009-12 的日期范围内用gmsl 作为NaN 填充缺失的月份?

我已经通过df$Month_Yr &lt;- format(as.Date(df$date), "%Y-%m")提取了日期列的年份和月份。

【问题讨论】:

    标签: r dataframe date


    【解决方案1】:

    这是tidyr::complete 的一种方法

    library(dplyr)
    
    df %>%
      mutate(date = as.Date(date), 
             first_date = as.Date(format(date, "%Y-%m-01"))) %>%
      tidyr::complete(first_date = seq(min(first_date), max(first_date), "1 month"))
    
    
    # A tibble: 12 x 3
    #   first_date date        gmsl
    #   <date>     <date>     <dbl>
    # 1 2009-01-01 2009-01-17  58.4
    # 2 2009-02-01 2009-02-17  59.1
    # 3 2009-03-01 NA          NA  
    # 4 2009-04-01 2009-04-16  60.9
    # 5 2009-05-01 NA          NA  
    # 6 2009-06-01 2009-06-16  62.3
    # 7 2009-07-01 NA          NA  
    # 8 2009-08-01 NA          NA  
    # 9 2009-09-01 2009-09-16  64.6
    #10 2009-10-01 NA          NA  
    #11 2009-11-01 NA          NA  
    #12 2009-12-01 2009-12-16  68.3
    

    然后,您可以决定保留哪一列,first_datedate 或将两者结合起来。

    数据

    df <- structure(list(date = structure(1:6, .Label = c("2009-01-17", 
    "2009-02-17", "2009-04-16", "2009-06-16", "2009-09-16", "2009-12-16"
    ), class = "factor"), gmsl = c(58.4, 59.1, 60.9, 62.3, 64.6, 
    68.3)), class = "data.frame", row.names = c(NA, -6L))
    

    【讨论】:

      【解决方案2】:

      在基础 R 中,您可以 match(使用 %in%substrings 的 seq.Date

      dt.match <- seq.Date(ISOdate(2009, 1, 1), ISOdate(2009, 12, 1), "month")
      sub <- 
        cbind(date=substr(dt.match, 1, 10)[!substr(dt.match, 1, 7) %in% substr(dat$date, 1, 7)], 
              gmsl=NA)
      merge(dat, sub, all=TRUE)
      #          date gmsl
      # 1  2009-01-17 58.4
      # 2  2009-02-17 59.1
      # 3  2009-03-01 <NA>
      # 4  2009-04-16 60.9
      # 5  2009-05-01 <NA>
      # 6  2009-06-16 62.3
      # 7  2009-07-01 <NA>
      # 8  2009-08-01 <NA>
      # 9  2009-09-16 64.6
      # 10 2009-10-01 <NA>
      # 11 2009-11-01 <NA>
      # 12 2009-12-16 68.3
      

      数据

      dat <- structure(list(date = c("2009-01-17", "2009-02-17", "2009-04-16", 
      "2009-06-16", "2009-09-16", "2009-12-16"), gmsl = c(58.4, 59.1, 
      60.9, 62.3, 64.6, 68.3)), row.names = c(NA, -6L), class = "data.frame")
      

      【讨论】:

      • 谢谢,如何检查date 列是否缺少月份?
      • @ahbon %in% 已经检查过了,因为日期在第 7 个字符串之后被切断,即比较月份。
      猜你喜欢
      • 2020-02-19
      • 1970-01-01
      • 2020-02-08
      • 2014-02-28
      • 1970-01-01
      • 2018-03-26
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多