【发布时间】:2021-12-24 03:32:46
【问题描述】:
我想计算上一年按组 B (country) 的组 A (industry) 的唯一值的出现次数的平方和(分别为行数)。
计算示例第 5 行:2x A + 1x B + 1x C = 2^2+1^2+^+1^2 = 6(不包括第 1 行的 A,因为它超过一年,也不包括第 6 行的 A,因为它在另一个国家/地区) .
我设法按行计算数字,但未能将其移至聚合日期级别:
dt[, count_by_industry:= sapply(date, function(x) length(industry[between(date, x - lubridate::years(1), x)])),
by = c("country", "industry")]
该解决方案理想地扩展到具有约 200 万行和大约 10k 日期和组元素的真实数据(因此有 data.table 标签)。
示例数据
ID <- c("1","2","3","4","5","6")
Date <- c("2016-01-02","2017-01-01", "2017-01-03", "2017-01-03", "2017-01-04","2017-01-03")
Industry <- c("A","A","B","C","A","A")
Country <- c("UK","UK","UK","UK","UK","US")
Desired <- c(1,4,3,3,6,1)
library(data.table)
dt <- data.frame(id=ID, date=Date, industry=Industry, country=Country, desired_output=Desired)
setDT(dt)[, date := as.Date(date)]
【问题讨论】:
标签: r data.table