【问题标题】:Average values between two dates by group按组划分的两个日期之间的平均值
【发布时间】:2021-06-04 18:06:31
【问题描述】:

我有两个数据框,一个包含每天和相机的值 (df1),另一个包含每个相机的日期范围 (df2)。我需要在 df2 的日期范围内为每个相机平均 df1 中的值。

**这是我的数据的一个非常简化的版本。我有大约 300 个摄像头,每个摄像头有多个日期范围,我需要为其取平均值。

df1 <- data.frame(camera=c("Cam 1","Cam 1","Cam 1","Cam 2","Cam 2","Cam 2","Cam 3","Cam 3","Cam 3"),
date=c("2011-03-15","2011-03-16","2011-03-17","2011-03-15","2011-03-16","2011-03-17","2011-03-17","2011-03-18","2011-03-19"),
value=c(1,0,2,3,1,2,2,1,0))

df1$date <- as.Date(df1$date,format='%Y-%m-%d')

df2 <- data.frame(camera=c("Cam 1","Cam 2","Cam 3"),
start_date=c("2011-03-15", "2011-03-15","2011-03-17"),
end_date=c("2011-03-17","2011-03-17","2011-03-19"))

df2$start_date <- as.Date(df2$start_date,format='%Y-%m-%d')
df2$end_date <- as.Date(df2$end_date,format='%Y-%m-%d')

我遇到的问题是多个相机的值相同相机)。我正在使用 dplyr 和 tidyr 包,我认为按相机分组可以解决这个问题,但事实并非如此。我也使用 mutate 或 summarise 函数得到相同的结果。我发现在一个日期范围内或按组求平均值有很多帮助,但不是两者兼而有之。任何帮助将不胜感激!这是我的代码:

 average<-df2 %>%
  group_by(camera, start_date, end_date) %>%
  mutate(avg= mean(df1$value[between(df1$date, start_date, end_date)])) %>%
  ungroup

我得到了这样一张桌子:

camera start_date end_date avg
Cam 1 2011-03-15 2011-03-17 1.57
Cam 2 2011-03-15 2011-03-17 1.57
Cam 3 2011-03-17 2011-03-19 1.40

当我想要这个时:

camera start_date end_date avg
Cam 1 2011-03-15 2011-03-17 1
Cam 2 2011-03-15 2011-03-17 2
Cam 3 2011-03-17 2011-03-19 1

【问题讨论】:

    标签: r


    【解决方案1】:

    我们可以使用非等值连接

    library(data.table)
    setDT(df1)[, .(start_date = date, end_date = date, camera,
         value)][df2, .(avg = mean(value)),
        on = .(camera, start_date >= start_date,
            end_date <= end_date), by = .EACHI]
    

    -输出

       camera start_date   end_date avg
    1:  Cam 1 2011-03-15 2011-03-17   1
    2:  Cam 2 2011-03-15 2011-03-17   2
    3:  Cam 3 2011-03-17 2011-03-19   1
    

    或使用tidyverse

    library(dplyr)
    library(fuzzyjoin)
    fuzzy_left_join(df1, df2, by = c("camera", "date" = "start_date", 
           "date" = "end_date"), match_fun = list(`==`, `>=`, `<=`)) %>%
        group_by(camera = camera.x, start_date, end_date) %>% 
        summarise(avg = mean(value), .groups = 'drop')
    

    -输出

    # A tibble: 3 x 4
      camera start_date end_date     avg
      <chr>  <date>     <date>     <dbl>
    1 Cam 1  2011-03-15 2011-03-17     1
    2 Cam 2  2011-03-15 2011-03-17     2
    3 Cam 3  2011-03-17 2011-03-19     1
    

    【讨论】:

    • 我试过了,但它给了我 2、2、0 的平均值,而它应该是 1、2、1
    • @AlexA 我用模糊连接更新了,正如你所展示的那样
    • @AlexA 你能检查一下解决方案吗?更新了data.table
    • 这两种解决方案都有效。谢谢!
    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2017-06-12
    • 2023-03-29
    • 2015-11-22
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多