计算日期范围内每个月的天数答案

【问题标题】：Count the number of days in each month of a date range计算日期范围内每个月的天数
【发布时间】：2018-06-19 16:35:17
【问题描述】：

我有一个包含开始日期和结束日期的数据框，就像这样

id <- c(1, 1, 2)
start <- c("2014-01-05", "2014-02-04", "2014-02-06")
end <- c("2014-02-03", "2014-04-29", "2014-03-07")
df <- data.frame(id, start, end)

 id        start          end
  1    2014-01-05   2014-02-03
  1    2014-02-04   2014-04-29
  2    2014-02-06   2014-03-07

我正在尝试确定如何计算每个月在开始日期和结束日期之间出现的日期数。如：

id    month_yyyy_mm count
 1          2014-01    27
 1          2014-02     3
 1          2014-02    25
 1          2014-03    31
 1          2014-04    29
 2          2014-02    23
 2          2014-03     7

我可以将字符串转换为日期，然后使用difftime 计算开始和结束之间的总差异，但我不知道如何按月计算。 lubridate 包中是否有任何可以提供帮助的东西？

【问题讨论】：

标签： r date lubridate

【解决方案1】：

考虑下面的函数f1, f2, f3

f1 <- function(d_first,d_last){
        d_first <- as.Date(d_first)
        d_last <- as.Date(d_last)

        D <- seq(d_first, d_last, 1) # generate all days in [d_first,d_last]
        M <- unique(format(D, "%m")) # all months in [d_first,d_lst]

        f2 <- function(x) length(which(format(D, "%m") == x)) # returns number of days in month x
        res <- vapply(M,f2,numeric(1))
        return(cbind(unique(format(D, "%Y-%m")),res))
      }
f3 <- function(k) f1(df$start[k],df$end[k])

output <- sapply(1:nrow(df), f3)

产生

> output 
[[1]]
             res 
01 "2014-01" "27"
02 "2014-02" "3" 

[[2]]
             res 
02 "2014-02" "25"
03 "2014-03" "31"
04 "2014-04" "29"

[[3]]
             res 
02 "2014-02" "23"
03 "2014-03" "7"

从现在开始，剩下的就是格式问题了。事实上，一个简单的do.call(rbind, output) 就可以解决问题

> do.call(rbind, output)
             res 
01 "2014-01" "27"
02 "2014-02" "3" 
02 "2014-02" "25"
03 "2014-03" "31"
04 "2014-04" "29"
02 "2014-02" "23"
03 "2014-03" "7"

在我的脑海中，有你可以设置f4 <- function(k) cbind(df$id[k], f3(k))的ID，因此

> do.call(rbind, sapply(1:nrow(df), f4))
                 res 
01 "1" "2014-01" "27"
02 "1" "2014-02" "3" 
02 "1" "2014-02" "25"
03 "1" "2014-03" "31"
04 "1" "2014-04" "29"
02 "2" "2014-02" "23"
03 "2" "2014-03" "7"

但可能有更聪明的解决方案。

【讨论】：

看起来真的很棒；您对如何将原始数据框中的 id 列保留在这三个函数生成的数据框的行中有何建议？
设置f4 <- function(k) cbind(df$id[k], f3(k)) - 查看编辑。
像魅力一样工作。非常感谢。

【解决方案2】：

这是一种不同的方法，它使用 data.table 包中的 foverlaps() 函数。

foverlaps() 发现创建的月份的第一天和最后一天的序列与给定期间之间的重叠。

library(data.table)
library(lubridate)

# coerce dates from character to IDate
cols <- c("start", "end")
DT <- as.data.table(df)[, (cols) := lapply(.SD, as.IDate), .SDcols = cols]

# create sequence of months which cover all periods
mon_seq <- DT[, as.IDate(seq(floor_date(min(start), unit = "months"), 
                             ceiling_date(max(end), unit = "months"),
                             by = "month"))]
# create helper data.table with first and last day of months
mDT <- data.table(start = head(mon_seq, -1L), end = tail(mon_seq, -1L) - 1L)
setkeyv(DT, cols)
# find overlapping pieces for each month
foverlaps(mDT, DT, nomatch = 0L)[
  # compute count of days in each month
  , {tmp <- pmax(start, i.start)
  .(id = id, month = format(tmp, "%Y-%m"), 
    count = as.integer(difftime(pmin(end, i.end), tmp, units = "days")) + 1L)
  }][
    # reorder conveniently
    order(id, month)]

   id   month count
1:  1 2014-01    27
2:  1 2014-02     3
3:  1 2014-02    25
4:  1 2014-03    31
5:  1 2014-04    29
6:  2 2014-02    23
7:  2 2014-03     7

【讨论】：