【问题标题】:Aggregate df based on columns and group result根据列和分组结果聚合 df
【发布时间】:2015-09-09 19:00:25
【问题描述】:

我正在尝试执行以下操作,我的数据集看起来像这样,它包含 POSIXct 格式的日期、每小时风速和每小时风向(df 称为 wind_DNSeason)。我的目标是根据基于季节和日光的波弗特尺度获得风速的频率计数。

  date                     wspd_havg10m_kn avg_wdir
1 2013-12-06 00:25:00        9.835853       50
2 2013-12-06 01:25:00       10.506479       56
3 2013-12-06 02:25:00       11.847732       55
4 2013-12-06 03:25:00        8.494600       53
5 2013-12-06 04:25:00       13.188985       47
6 2013-12-06 05:25:00       13.188985       60

根据日期添加季节:

wind_DNSeason$season<-time2season(wind_DNSeason$date, out.fmt="seasons", type="default")

然后我使用 openair 包将数据切割成白天和夜间:

wind_DNSeason$daylight <- cutData(wind, type = "daylight", local.hour.offset = -8, latitude = 54.312519, longitude = -130.305405, local.tz= "Canada/Pacific")

我知道函数聚合,但我怀疑我是否正确使用它:

aggregate(wspd_havg10m_kn ~ season + daylight, wind_DNSeason, length)

这给了我发生的次数,但这不是我想要的。 我是不是想一步完成太多?

我需要知道每个季节在白天和黑夜中出现的风速分组(请参阅下面的休息时间)。因为我想创建具有不同频率的条形图。 休息=c(0,1,3,6,10,16, 21, 27, 33, 40, 47)

我能否得到一些看起来像这样的东西,然后我可以从中轻松计算百分比以将其绘制在条形图中:

  season  daylight            total_count  wspd<=1 wspd>1,<=3 wspd>3,<=6 etc

1 autumm  daylight             854            151      34         56   
2 spring  daylight            2580            456      56         98
3 summer  daylight            1722            34       344        09
4 winter  daylight             852            545      55         55
5 autumm nighttime            1030            55        6         777
6 spring nighttime            1825            89       89         344
7 summer nighttime             827            344      55         66
8 winter nighttime            1533            34       66         777

有什么想法吗?感谢您的帮助!

我尝试使用 dplyr,我认为我真的很接近,但不知何故,它似​​乎没有正确地将频率相加。这就是我应用建议代码的方式:

a<-wind_DNSeason %>% group_by(season,daylight) %>% 
  mutate(count=n(),"wspd<=1" = sum(wspd_havg10m_kn<=1),
     "wspd>1,<=3" = sum(wspd_havg10m_kn > 1 & wspd_havg10m_kn <= 3, na.rm=TRUE), 
     "wspd>3,<=6" = sum(wspd_havg10m_kn > 3 & wspd_havg10m_kn <= 6,na.rm=TRUE),
     "wspd>6,<=10" = sum(wspd_havg10m_kn > 6 & wspd_havg10m_kn <= 10,na.rm=TRUE),
     "wspd>10,<=16" = sum(wspd_havg10m_kn > 10 & wspd_havg10m_kn <= 16,na.rm=TRUE),
     "wspd>16,<=21" = sum(wspd_havg10m_kn > 16 & wspd_havg10m_kn <= 21,na.rm=TRUE),
     "wspd>21,<=27" = sum(wspd_havg10m_kn > 21 & wspd_havg10m_kn <= 27,na.rm=TRUE),
     "wspd>27,<=33" = sum(wspd_havg10m_kn > 27 & wspd_havg10m_kn <= 33,na.rm=TRUE),
     "wspd>33,<=40" = sum(wspd_havg10m_kn > 33 & wspd_havg10m_kn <= 40,na.rm=TRUE),
     "wspd>40,<=47" = sum(wspd_havg10m_kn > 33 & wspd_havg10m_kn <= 47,na.rm=TRUE))

输出看起来像这样,我选择了一些独特的行,因为它在整个 df 中复制了它(例如冬季白天和夜间):

date    wspd_havg10m_kn avg_wdir    daylight    season  count   wspd<=1 wspd>1,<=3  wspd>3,<=6  wspd>6,<=10 wspd>10,<=16    wspd>16,<=21    wspd>21,<=27    wspd>27,<=33    wspd>33,<=40    wspd>40,<=47
1   2013-12-06 00:25:00 9.8358531   50  nighttime   winter  2751    NA  59  185 315 551 260 106 47  6   6
2   2013-12-06 12:25:00 7.3768898   57  daylight    winter  1449    NA  13  73  251 322 133 46  13  0   0

不同组的频率加起来不应该是总数吗?总 df 包含 13368 个时间步长,如果我将每个组的频率相加,我只会得到 11165。没有比最大组更大的风速。我错过了什么?

【问题讨论】:

  • 您可能需要使用?cut 将数据切割成您指定的间隔
  • 我经常使用plyr 包。其他人可能会建议 dplyr,但当我愿意深入研究新语法时,我会选择 data.table 包。
  • 我已经更新了我的答案 - 我应该使用 summarise 而不是 mutate:现在输出会更简洁。可能缺少的值是 NA - 它们不会包含在最终计数中,因为您已将它们删除。
  • 你是个英雄,我想删除 NA,但当时并没有从我预期的总数中扣除它们。谢谢 Jeremycg!

标签: r dataframe aggregate openair


【解决方案1】:

这是dplyr 解决方案:

library(dplyr)
wind_DNSeason %>% group_by(season,daylight) %>% 
    summarise(count=n(),"wspd<=1" = sum(wspd_havg10m_kn<=1),
           "wspd>1,<=3" = sum(wspd_havg10m_kn > 1 & wspd_havg10m_kn <= 3),
           "wspd>3,<=6" = sum(wspd_havg10m_kn > 3 & wspd_havg10m_kn <= 6)
    )

您可以根据需要添加任意多的风力列,填写名称和要求。

【讨论】:

  • 非常感谢您的建议。我想我真的很接近。但不知何故,我认为这并不完全是正确的。
  • 你能解释一下什么不起作用吗?我不确定你还想要输出什么。
  • 抱歉,我用 mutate 代替 summarise,现在试试
【解决方案2】:

您在 cmets 中提到了plyr,因此您可以这样做:

library("plyr")

ddply(wind_DNSeason, .(season, daylight), summarize, n = length(wspd_havg10m_kn),
     "wspd<=1" = sum(wspd_havg10m_kn <= 1))

此外,如果您想自动创建这些计算值,您可以这样做:

calc = function(x) {
   cuts = c(1, 3, 6, 10)
   res = data.frame(n = nrow(x))
   for(i in 1:(length(cuts) - 1)) {
       nm = sprintf("wspd>%d, <=%d", cuts[i], cuts[i + 1])
       val = sum(x$wspd_havg10m_kn > cuts[i] & x$wspd_havg10m_kn < cuts[i + 1], na.rm = T)
       res[, nm] = val
   }
   return(res)
}

ddply(wind_DNSeason, .(season, daylight), "calc")

【讨论】:

    猜你喜欢
    • 2020-11-11
    • 2022-01-23
    • 2023-01-26
    • 2021-04-06
    • 1970-01-01
    • 2018-06-24
    • 1970-01-01
    • 2023-01-10
    相关资源
    最近更新 更多