【问题标题】:Sum values in df1 based on date ranges in df2根据 df2 中的日期范围对 df1 中的值求和
【发布时间】:2018-07-11 16:43:40
【问题描述】:

我试图返回另一个数据框中两个日期之间的一个数据框的值的总和。 Stack 中提供的答案似乎不适用于我的应用程序。我尝试过使用 data.table 但无济于事,所以就这样吧。

创建日期范围

MeanRemaining <- seq(as.Date("2017-01-01"),as.Date("2017-02-28"),2)
MeanRemaining<-as.data.frame(cbind(MeanRemaining,lag(MeanRemaining)))
colnames(MeanRemaining)<-c("InspDate", "PrevInspDate")
MeanRemaining$InspDate<-as.Date(MeanRemaining$InspDate, origin = "1970/01/01")
MeanRemaining$PrevInspDate<-as.Date(MeanRemaining$PrevInspDate, origin = "1970/01/01")

需要注意的是,日期范围实际上并没有像上面那样固定,可能是任何可能的范围,最多相隔一周。

创建要求和的值

DailyTonnes <- as.data.frame(cbind(as.data.frame(seq(as.Date
+ ("2016-12-01"),as.Date("2017-03-28"),1)),(replicate(1,sample(abs(rnorm(118))*1000,rep=TRUE)))))
colnames(DailyTonnes)<-c("date","Vol")

目标

我想在“MeanRemaining”中的每个日期范围之间对“DailyTonnes”中的“Vol”求和,并将总“Vol”返回到“MeanRemaining”中的相应行。

在我尝试过的类似问题的帮助下

library(data.table)
setDT(MeanRemaining)
setDT(DailyTonnes)

MeanRemaining[DailyTonnes[MeanRemaining, sum(Vol), on = .(date >= InspDate, date <= PrevInspDate),
            by = .EACHI], TotalVol := V1, on = .(InspDate=date)]

但是这会返回 NA 值。

任何建议将不胜感激。

【问题讨论】:

    标签: r


    【解决方案1】:

    我相信你的问题包含了你需要的所有答案。

    我稍微完善了您的代码并更改了最后一行(这是唯一错误的行)。最后一行的连接过于复杂,我认为它不会带来任何内存/性能提升。

    library(data.table)
    # Create MeanRemaining
    MeanRemaining <-
      data.table(InspDate = seq(as.Date("2017-01-01"), as.Date("2017-02-28"), 2))
    # I changed lag by shift, I think it is clearer this way
    MeanRemaining[, PrevInspDate := shift(InspDate, type = "lead", fill = 1000000L)]
    
    # set seed for repetibility
    set.seed(13)
    # Create DailyTonnes, I changed the end date to generate empty intervals
    DailyTonnes <- data.table(date = seq(as.Date("2016-12-01"),
                                         as.Date("2017-01-28"), 1),
                              Vol = sample(abs(rnorm(118)) * 1000, rep = TRUE))
    
    # I changed the <= condition to <, I think it fits PrevInspDate better
    # This should be your final result if I'm not wrong
    SingleCase <-
      DailyTonnes[MeanRemaining, sum(Vol), on = .(date >= InspDate, date < PrevInspDate), by = .EACHI]
    
    # SingleCase has two variables called date, this may be a small bug in data.table
    print(names(SingleCase))
    
    # change the names of the data.table to suit your needs
    names(SingleCase) <- c("InspDate", "PrevInspDate", "TotalVol")
    

    编辑:从表 MeanRemaining 中恢复多个变量

    从 MeanRemaining 检索多个变量的情况非常棘手。少量变量很容易解决:

    # Add variables to MeanRemaining
    for (i in 1:100) {
      MeanRemaining[, paste0("extracol", i) := sample(.N)]
    }
    
    # Two variable case
    smallmultiple <-
      DailyTonnes[MeanRemaining, list(TotalVol = sum(Vol),
                                      extracol1 = i.extracol1 ,
                                      extracol2 = i.extracol2), on = .(date >= InspDate, date < PrevInspDate), by = .EACHI]
    
    # Correct date names
    names(smallmultiple)[1:2] <- c("InspDate", "PrevInspDate")
    

    当涉及到很多变量时,它变得很难。有this feature request in github 可以解决您的问题,但目前不可用。 This question 面临类似的问题,但不能用于您的情况。

    大量变量的解决方法是:

    # obtain names of variables to be kept in the later join
    joinkeepcols <-
      setdiff(names(MeanRemaining),  c("InspDate", "PrevInspDate"))
    
    # the "i" indicates the table to take the variables from
    joinkeepcols2 <- paste0("i.", joinkeepcols)
    
    # Prepare a expression for the data.table environment
    keepcols <-
      paste(paste(joinkeepcols, joinkeepcols2, sep = " = "), collapse = ", ")
    
    # Complete expression to be evaluated in data.table
    evalexpression <- paste0("list(
                             TotalVol = sum(Vol),",
                             keepcols, ")")
    
    # The magic comes here (you can assign it to MeanRemaining)
    bigmultiple <-
      DailyTonnes[MeanRemaining, eval(parse(text = evalexpression)), on = .(date >= InspDate, date < PrevInspDate), by = .EACHI]
    
    # Correct date names
    names(bigmultiple)[1:2] <- c("InspDate", "PrevInspDate")
    

    【讨论】:

    • 谢谢@JonNagra,效果很好。实际的“MeanRemaining”数据框有更多列,我想在最终输出中保留这些列,这是我使用的更复杂的代码派上用场的地方。我最终将它重新绑定在一起,但我确信有一种更清洁的方法。
    • 您好@Nick,我知道如何在联接中包含更多列。我现在必须工作,但我回来后会尝试编辑。同时,如果你想探索自己,可以去DailyTonnes[MeanRemaining, list(V = sum(Vol),i.extracol1), on = .(date &gt;= InspDate, date &lt; PrevInspDate), by = .EACHI]
    猜你喜欢
    • 1970-01-01
    • 2021-06-08
    • 1970-01-01
    • 2021-04-07
    • 1970-01-01
    • 2021-04-26
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多