【问题标题】:add missing values to the data frame向数据框中添加缺失值
【发布时间】:2013-02-15 19:45:24
【问题描述】:

在 R 中,我有一个数据框“closeValues” 如下

>closeValues
            date        value
    1  1980-12-10       5
    2  1980-12-15       8
    3  1980-12-18       7
    4  1980-12-20       1

但如果“日期”缺失,我需要用以前的值填充“值”字段的值。 其实我需要以下输出

>closeValues
   date        value
1  1980-12-10       5
2  1980-12-11       5
3  1980-12-12       5
4  1980-12-13       5
5  1980-12-14       5
6  1980-12-15       8
7  1980-12-16       8
8  1980-12-17       8
9  1980-12-18       7
10 1980-12-19       7
11 1980-12-20       1

在 R 中有可能吗?

【问题讨论】:

标签: r date dataframe


【解决方案1】:

使用来自zoo 包的na.locf,您可以这样做:

dat1 <- data.frame(date = seq(as.Date('1980-12-10'),as.Date('1980-12-20'),1))
## the merge will fill dat1 with NA, and na.locf do the rest 
na.locf(zoo(merge(dat1,dat,all.x=T)))
   date       value
1  1980-12-10  5   
2  1980-12-11  5   
3  1980-12-12  5   
4  1980-12-13  5   
5  1980-12-14  5   
6  1980-12-15  8   
7  1980-12-16  8   
8  1980-12-17  8   
9  1980-12-18  7   
10 1980-12-19  7   
11 1980-12-20  1   

PS请下次提供可重现的示例。余可以这样写:

  dat <- data.frame(date = as.Date(c('1980-12-10','1980-12-15',
                                   '1980-12-18','1980-12-20')), 
                    value=c(5,8,7,1))

或者

dput(dat)
structure(list(date = structure(c(3996, 4001, 4004, 4006), class = "Date"), 
    value = c(5, 8, 7, 1)), .Names = c("date", "value"), row.names = c(NA, 
-4L), class = "data.frame")

【讨论】:

    【解决方案2】:

    这可能会在基础 R 中做你想要的:

    df.1 <- read.table(text='
                DATE   VALUE
          1980-12-10       5
          1980-12-15       8
          1980-12-18       7
          1980-12-20       1', header=T, colClasses=c('character', 'numeric'))
    
    df.1$DATE2 <- as.Date(df.1$DATE)
    
    df.1$diffs <- c(as.numeric(diff(df.1$DATE2)),1)
    
    df.2 <- df.1[rep(1:nrow(df.1),df.1$diffs),]
    
    df.2$running.count = sequence(rle(df.2$VALUE)$lengths)
    
    df.2$DATE3 <- df.2$DATE2 + (df.2$running.count-1)
    df.2
    
    #           DATE VALUE      DATE2 diffs running.count      DATE3
    # 1   1980-12-10     5 1980-12-10     5             1 1980-12-10
    # 1.1 1980-12-10     5 1980-12-10     5             2 1980-12-11
    # 1.2 1980-12-10     5 1980-12-10     5             3 1980-12-12
    # 1.3 1980-12-10     5 1980-12-10     5             4 1980-12-13
    # 1.4 1980-12-10     5 1980-12-10     5             5 1980-12-14
    # 2   1980-12-15     8 1980-12-15     3             1 1980-12-15
    # 2.1 1980-12-15     8 1980-12-15     3             2 1980-12-16
    # 2.2 1980-12-15     8 1980-12-15     3             3 1980-12-17
    # 3   1980-12-18     7 1980-12-18     2             1 1980-12-18
    # 3.1 1980-12-18     7 1980-12-18     2             2 1980-12-19
    # 4   1980-12-20     1 1980-12-20     1             1 1980-12-20
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 2020-08-22
      • 2014-06-03
      • 2022-01-13
      • 2018-03-02
      • 2014-03-29
      • 2018-11-04
      • 2021-04-06
      • 1970-01-01
      相关资源
      最近更新 更多