我相信你的问题包含了你需要的所有答案。
我稍微完善了您的代码并更改了最后一行(这是唯一错误的行)。最后一行的连接过于复杂,我认为它不会带来任何内存/性能提升。
library(data.table)
# Create MeanRemaining
MeanRemaining <-
data.table(InspDate = seq(as.Date("2017-01-01"), as.Date("2017-02-28"), 2))
# I changed lag by shift, I think it is clearer this way
MeanRemaining[, PrevInspDate := shift(InspDate, type = "lead", fill = 1000000L)]
# set seed for repetibility
set.seed(13)
# Create DailyTonnes, I changed the end date to generate empty intervals
DailyTonnes <- data.table(date = seq(as.Date("2016-12-01"),
as.Date("2017-01-28"), 1),
Vol = sample(abs(rnorm(118)) * 1000, rep = TRUE))
# I changed the <= condition to <, I think it fits PrevInspDate better
# This should be your final result if I'm not wrong
SingleCase <-
DailyTonnes[MeanRemaining, sum(Vol), on = .(date >= InspDate, date < PrevInspDate), by = .EACHI]
# SingleCase has two variables called date, this may be a small bug in data.table
print(names(SingleCase))
# change the names of the data.table to suit your needs
names(SingleCase) <- c("InspDate", "PrevInspDate", "TotalVol")
编辑:从表 MeanRemaining 中恢复多个变量
从 MeanRemaining 检索多个变量的情况非常棘手。少量变量很容易解决:
# Add variables to MeanRemaining
for (i in 1:100) {
MeanRemaining[, paste0("extracol", i) := sample(.N)]
}
# Two variable case
smallmultiple <-
DailyTonnes[MeanRemaining, list(TotalVol = sum(Vol),
extracol1 = i.extracol1 ,
extracol2 = i.extracol2), on = .(date >= InspDate, date < PrevInspDate), by = .EACHI]
# Correct date names
names(smallmultiple)[1:2] <- c("InspDate", "PrevInspDate")
当涉及到很多变量时,它变得很难。有this feature request in github 可以解决您的问题,但目前不可用。 This question 面临类似的问题,但不能用于您的情况。
大量变量的解决方法是:
# obtain names of variables to be kept in the later join
joinkeepcols <-
setdiff(names(MeanRemaining), c("InspDate", "PrevInspDate"))
# the "i" indicates the table to take the variables from
joinkeepcols2 <- paste0("i.", joinkeepcols)
# Prepare a expression for the data.table environment
keepcols <-
paste(paste(joinkeepcols, joinkeepcols2, sep = " = "), collapse = ", ")
# Complete expression to be evaluated in data.table
evalexpression <- paste0("list(
TotalVol = sum(Vol),",
keepcols, ")")
# The magic comes here (you can assign it to MeanRemaining)
bigmultiple <-
DailyTonnes[MeanRemaining, eval(parse(text = evalexpression)), on = .(date >= InspDate, date < PrevInspDate), by = .EACHI]
# Correct date names
names(bigmultiple)[1:2] <- c("InspDate", "PrevInspDate")