【问题标题】:R - aggregate 30-min dataframe to hourly dataframe?R - 将 30 分钟数据帧聚合为每小时数据帧?
【发布时间】:2017-08-12 08:50:48
【问题描述】:

我有这个以 30 分钟的步长记录的数据集:

structure(list(Particles = c(0.596667, 0.27, 0.153333, 0, 0.753333, 
    0, 0.35, 0.506667, 1.6, 0.116667), PM = c(35.5158928571429, 16.0714285714286, 
    9.12696428571429, 0, 44.84125, 0, 20.8333333333333, 30.15875, 
    95.2380952380953, 6.94446428571429), timestamp = c(1493310389147, 
    1493310419191, 1493310449254, 1493310479270, 1493310509313, 1493310539387, 
    1493310569416, 1493310599465, 1493310629525, 1502378711339), 
        date = structure(c(1493310389.147, 1493310419.191, 1493310449.254, 
        1493310479.27, 1493310509.313, 1493310539.387, 1493310569.416, 
        1493310599.465, 1493310629.525, 1502378711.339), class = c("POSIXct", 
        "POSIXt"), tzone = "UTC-1"), site = c("ABC", "ABC", 
        "ABC", "ABC", "ABC", "ABC", 
        "ABC", "ABC", "ABC", "ABC"
        ), code = c("ABC", "ABC", "ABC", 
        "ABC", "ABC", "ABC", "ABC", 
        "ABC", "ABC", "ABC"), key_date = c("2017-04-27", 
        "2017-04-27", "2017-04-27", "2017-04-27", "2017-04-27", "2017-04-27", 
        "2017-04-27", "2017-04-27", "2017-04-27", "2017-08-10")), .Names = c("Particles", 
    "PM", "timestamp", "date", "site", "code", "key_date"), row.names = c(NA, 
    10L), class = "data.frame")

如何将其聚合到每小时步数?我的列因一个数据框而异,因此我需要一种方法来聚合它,以便它也可以应用于其他数据框。

编辑:

我试过了:

res <- aggregate(Df['PM'], list(date = cut(as.POSIXct(Df$date), "1 hour")), sum)

但这只给我留下了两列,其余的都没有了。我该如何保留它们?

【问题讨论】:

  • 如何使用cutdf1 %&gt;% group_by(Hour = cut(date, breaks = "hour")) %&gt;% summarise(PM = sum(PM))
  • @akrun 我该怎么做?我收到此错误Warning: Error in %&gt;%: could not find function "%&gt;%"
  • 我假设你有library(dplyr); df1 %&gt;% group_by(..
  • @akrun 现在可以正常工作了。谢谢。

标签: r dataframe shiny


【解决方案1】:

我们可以使用cut创建每小时分组变量,然后使用summarise

library(dplyr)
df1 %>%
    group_by(Hour = cut(date, breaks = "hour")) %>% 
    summarise(PM = sum(PM))

我们还可以创建一个函数来传递分组列和列为summarise

fSumm <- function(dat, dateVar, groupVars, colstoSumm){
         dat %>%
             group_by(Hour = cut(!! rlang::sym(dateVar), breaks = "hour")) %>%
             group_by(!!! rlang::syms(groupVars), add = TRUE) %>%
             summarise_at(vars(colstoSumm), sum) 
  }

groups <- c("site", "code")
cols <- c("Particles", "PM")
dateV <- "date"
fSumm(df1, dateV, groups, cols)

我们也可以使用quo路由

fSumm <- function(dat, dateVar, groupVars, colstoSumm){
    cols <- sapply(colstoSumm, quo_name)

     dat %>%
         group_by(Hour = cut(!! dateVar, breaks = "hour")) %>%
         group_by(!!! groupVars, add = TRUE) %>%
         summarise_at(vars(cols), sum) 
 }

fSumm(df1, quo(date), quos(site, code), quos(Particles, PM))

【讨论】:

  • @teelou 您需要将输出分配给一个对象,即res &lt;- df1 %&gt;% group_by...如果这需要是原始数据集中的一列,那么df1 &lt;- df1 %&gt;% group_by(....) %&gt;% mutate(PMSum = sum(PM))
  • @teelou 我会使用mutate 而不是summarise 然后在数据集中创建一个新列
  • @teelou 只需将它们(您要保留的列)添加到 group_by 调用中:group_by(Hour = cut(date, breaks = "hour"), site, code, key_date)
  • @teelou 这取决于您希望它如何动态。创建一个包含组列的函数。更新了帖子
  • @teelou 请检查日期如何在 openair 中转换为“小时”
【解决方案2】:

我们可以试试:

library(data.table)
setDT(df)
varsToSum <- c("PM", "Particles")
df[, lapply(.SD[, ..varsToSum], sum), by = format(date, "%Y-%m-%d-%H")]

          format         PM Particles
1: 2017-04-27-17 251.785714  4.230000
2: 2017-08-10-16   6.944464  0.116667

我们可以很容易地将其扩展到包含残差变量的第一个值:

cbind(
  df[, lapply(.SD[, ..varsToSum], sum), by = format(date, "%Y-%m-%d-%H")]
  , df[, lapply(.SD[, !(names(df) %in% varsToSum), with = FALSE], head, 1), 
   by = format(date, "%Y-%m-%d-%H")][, -"format"]
)

          format         PM Particles    timestamp site code   key_date
1: 2017-04-27-17 251.785714  4.230000 1.493310e+12  ABC  ABC 2017-04-27
2: 2017-08-10-16   6.944464  0.116667 1.502379e+12  ABC  ABC 2017-08-10

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2018-03-04
    • 1970-01-01
    • 1970-01-01
    • 2021-07-18
    • 1970-01-01
    • 2021-12-24
    相关资源
    最近更新 更多