【问题标题】:Add sequence of dates to data.table (R)将日期序列添加到 data.table (R)
【发布时间】:2018-07-20 20:34:05
【问题描述】:

我有一个数据表,其中包含以不同频率重复发生事件的地点的位置。提供了最后一个事件的日期,以及它发生的频率。

例子:

dt
#    Location Last_Occurrence Frequency
# 1: Home     7-19-2018       30
# 2: School   6-6-2018        60
# 3: Moon     1-5-1993        90

我想做的是添加一个新列,其中包含每个地点的所有未来活动日期,直到 2018 年底。

所以,我想要一个如下所示的表格:

dt
#    Location Last_Occurrence Frequency Next_Dates
# 1: Home     7-19-2018       30        7-19-2018
# 2: Home     7-19-2018       30        8-18-2018
# 3: Home     7-19-2018       30        9-17-2018
# 4: Home     7-19-2018       30        10-17-2018
# 5: Home     7-19-2018       30        11-16-2018
# 6: Home     7-19-2018       30        12-16-2018
# 7: School   6-6-2018        60        6-6-2018
# 8: School   6-6-2018        60        8-5-2018
# 9: School   6-6-2018        60        10-4-2018
etc.

我该怎么做呢?我怀疑 lapply 函数会很有用,因为我在每个位置都这样做......

我已经弄清楚如何使用“while”循环来生成未来日期的向量:

Last_Sample_Date <- Sys.Date() #For testing
increase <- 5 #For testing
NextDate <- Last_Sample_Date+increase
multiplier <- 1  

# Create vector of next sampling dates - updated with each iteration of the while loop
NextDates <- c(Last_Sample_Date, NextDate)

while (year(NextDate) == 2018) {
  multiplier <- multiplier+1
  NextDate <- NextDate+multiplier*increase

  #Add to vector of next sampling dates
  NextDates <- append(NextDates, NextDate)
})

(我意识到这实际上会生成一个包含 2019 年最终日期的向量,但我可以接受。)

我可以以某种方式使用这个while循环吗,或者我应该有其他方法吗?

【问题讨论】:

    标签: r datatable


    【解决方案1】:

    我的 data.table 版本

    library(data.table)
    
    # create example dataset
    dt <- data.table(
            location = c("home", "school", "moon"),
            orig_date = as.Date(c("2018-07-19", "2018-06-06", "2015-01-05")),
            freq_days = c(30, 60, 90)
    )
    
    # figure out how many new rows are needed
    dt[ , rows_needed := length(seq(from=orig_date, to=as.Date("2018-12-31"), by=paste(freq_days,"days"))), by=location]
    
    # expand the data.table to include the new rows
    dt <- dt[rep(1:nrow(dt), times=rows_needed)]
    
    # add the dates of occurrence
    dt[ , date_of_occurrence := seq(from=orig_date[1], to=as.Date("2018-12-31"), by=paste(freq_days[1],"days")), by=location]
    
    # shift dates of occurrence to get next date
    dt[ , next_date := shift(date_of_occurrence, type="lead"), by=location]
    
    # drop rows where next occurrence is after 2018 (should you want this)
    dt <- dt[!is.na(next_date)]
    

    【讨论】:

    • 嗨,丹 - 感谢您的帮助!我无法让“弄清楚需要多少新行”行工作。我很确定我输入正确,但我收到以下错误:Error in seq.Date(from = orig_date[1], to = as.Date("2018-12-31"), : 'by' is NA 知道发生了什么吗?我已经通过运行对其进行了测试:seq(from=orig_date[1], to=as.Date("2018-12-31"), by=paste(freq_days,"days")) ...而且效果很好。
    • 检查您的列 freq_days 没有任何类似 sum(is.na(dt$freq_days)) 的 NA,并检查您是否附加了 data.table 包。如果问题仍然存在,请将您的代码通过电子邮件发送给我,我会查看。
    • 知道了!非常感谢您的帮助!它工作得很好:)
    【解决方案2】:

    IIUC,complete 来自tidyr

    df %>% group_by(Location,Frequency,Last_Occurrence) %>%
          mutate(next_date=Last_Occurrence)%>%
          complete(next_date=seq(from = next_date, to = as.Date("2018-12-31"),by = Frequency))
    
    # A tibble: 10 x 4
    # Groups:   Location, Frequency, Last_Occurrence [2]
       Location Frequency Last_Occurrence  next_date
          <chr>     <int>          <date>     <date>
     1     Home        30      2018-07-19 2018-07-19
     2     Home        30      2018-07-19 2018-08-18
     3     Home        30      2018-07-19 2018-09-17
     4     Home        30      2018-07-19 2018-10-17
     5     Home        30      2018-07-19 2018-11-16
     6     Home        30      2018-07-19 2018-12-16
     7   School        60      2018-06-06 2018-06-06
     8   School        60      2018-06-06 2018-08-05
     9   School        60      2018-06-06 2018-10-04
    10   School        60      2018-06-06 2018-12-03
    

    【讨论】: