【发布时间】:2016-02-07 04:43:19
【问题描述】:
我想计算两个日期之间变量的平均值,下面是可重现的数据框。
year <- c(1996,1996,1996,1996,1996,1996,1996,1996,1996,1996,1996,1996,
1996,1996,1996,1996,1996,1996,1996,1996,1996,1996,1996,1996,
1997,1997,1997,1997,1997,1997,1997,1997,1997,1997,1997,1997,
1997,1997,1997,1997,1997,1997,1997,1997,1997,1997,1997,1997)
month <- c("JAN","FEB","MAR","APR","MAY","JUN","JUL","AUG","SEP","OCT","NOV","DEC")
station <- c("A","A","A","A","A","A","A","A","A","A","A","A",
"B","B","B","B","B","B","B","B","B","B","B","B")
concentration <- as.numeric(round(runif(48,20,40),1))
df <- data.frame(year,month,station,concentration)
id <- c(1,2,3,4)
station1996 <- c("A","A","B","B")
station1997 <- c("B","A","A","B")
start <- c("06/01/1996","07/01/1996","07/01/1996","08/01/1996")
end <- c("04/01/1997","04/01/1997","04/01/1997","05/01/1997")
participant <- data.frame(id,station1996,station1997,start,end)
participant$start <- as.Date(participant$start, format = "%m/%d/%Y")
participant$end <- as.Date(participant$end, format = "%m/%d/%Y")
所以我有两个数据集如下
df
year month station concentration
1 1996 JAN A 24.4
2 1996 FEB A 37.0
3 1996 MAR A 39.5
4 1996 APR A 28.0
...
45 1997 SEP B 37.7
46 1997 OCT B 35.2
47 1997 NOV B 26.8
48 1997 DEC B 40.0
participant
id station1996 station1997 start end
1 1 A B 1996-06-01 1997-04-01
2 2 A A 1996-07-01 1997-04-01
3 3 B A 1996-07-01 1997-04-01
4 4 B B 1996-08-01 1997-05-01
对于每个 id,我想计算开始日期和结束日期(月份年份)之间的平均浓度。注意到该站可能会在几年之间发生变化。
例如对于 id=1,我想计算 1996 年 6 月和 1997 年 4 月之间的平均浓度。这应该基于 1996 年 6 月到 1996 年 12 月 A 站的浓度,以及 1997 年 1 月到 1997 年 4 月在站的浓度B.
有人可以帮忙吗?
非常感谢。
【问题讨论】:
-
第一步:将
start和end转换为Date或POSIXct格式,将year和month合并为一个相同格式的新列。 -
您也可以将它们转换为字符串,例如“1997-10”。然后你可以像
mean(concentration[date >= start & date <= end]) -
library(zoo); as.yearmon(participant$start)等...如果您不想处理稍微笨拙的 POSIXct 格式,在这种情况下也可能很方便。 -
谢谢Toomet,但我需要考虑换站
-
我已经编辑了原始问题。必须指定日期吗?虽然我只有月份和年份