【问题标题】:How to group per time interval in R如何在R中按时间间隔分组
【发布时间】:2022-01-14 16:44:50
【问题描述】:

我在下面有一个数据集

Date Status Value
05/12/2021 23:59 Failed 500
05/12/2021 23:59 Successful 1500
05/12/2021 23:59 Successful 500
05/12/2021 23:59 Successful 1500
05/12/2021 23:59 Successful 1500
05/12/2021 23:59 Failed 1500
05/12/2021 23:59 Failed 1500
05/12/2021 23:59 Successful 500
05/12/2021 23:59 Successful 1500
05/12/2021 23:59 Failed 1500
05/12/2021 23:59 Successful 500
05/12/2021 23:59 Failed 500
05/12/2021 23:59 Failed 1500
05/12/2021 23:59 Successful 1500
05/12/2021 23:59 Failed 1500
05/12/2021 23:59 Successful 1500
05/12/2021 23:59 Successful 1500
05/12/2021 23:59 Successful 500
05/12/2021 23:59 Successful 500
05/12/2021 23:59 Successful 1500
05/12/2021 23:59 Successful 1500
05/12/2021 23:59 Successful 1500
05/12/2021 23:59 Failed 1500
05/12/2021 23:59 Successful 500
05/12/2021 23:59 Failed 500
05/12/2021 23:59 Failed 500
05/12/2021 23:59 Successful 1500
05/12/2021 23:59 Successful 500
05/12/2021 23:59 Successful 500
05/12/2021 23:59 Successful 1500
05/12/2021 23:59 Successful 500
05/12/2021 23:59 Successful 1500
05/12/2021 23:59 Successful 500

我希望能够拆分日期时间列以获得我的时间

然后将每小时间隔的时间分组

然后总结得到下面的列

我想知道一个小时内处理了多少笔交易

那么一小时内的值

然后有一列说明有多少成功,然后另一列说明一小时内有多少失败

请参阅下面所需汇总表的输出

Interval Value Count Successful Failed
00:00 am - 00:59 am 32,000 54 40 15
00:59 am - 01:00 am 42,000 55 41 14
01:00 am - 02:59 am 21,400 56 42 14
03:00 am - 03:59 am 4,00 57 43 14
04:00 am - 04:59 am 543,000 58 2 56
05:00 am - 05:59 am 411,000 59 6 53

【问题讨论】:

  • 您可能不需要提供那么多行数据。例如,如果数据存储在df 中,那么您可以使用df %>% head(10) %>% dput 获取可以粘贴到问题中的代码。这比要求回答者手动准备代码要容易。
  • 感谢您通过编辑和缩小表格进行纠正

标签: r datetime dplyr tidyverse lubridate


【解决方案1】:

这个怎么样:

library(tidyverse)
library(lubridate)
library(glue)

df <- tribble(~Date,    ~Status,    ~Value,
              "05/12/2021 23:59",   "Failed",   500,
              "05/12/2021 23:59",   "Successful",   1500,
              "05/12/2021 23:59",   "Successful",   500,
              "05/12/2021 23:59",   "Successful",   1500,
              "05/12/2021 23:59",   "Successful",   1500,
              "05/12/2021 23:59",   "Failed",   1500)

df2 <- df %>% 
    mutate(Datetime = dmy_hms(Date), # convert to datetime format
           Date = as.Date(Datetime), # extract date, if you need it later
           Hour = hour(Datetime))    # extract hour

hourly_value <- df2 %>% 
    group_by(Hour) %>% 
    summarize(Value = sum(Value),
              .groups = "drop")

hourly_count <- df2 %>% 
    count(Hour, Status) %>% 
    pivot_wider(names_from = "Status", values_from = "n")

interval_helper <- tibble(Hour = 0:23,
                          display_hour = str_pad(Hour %% 12, 2, pad = '0'),
                          ampm = if_else(Hour < 12, "am", "pm"),
                          Interval = glue("{display_hour}:00 {ampm} - {display_hour}:59 {ampm}"))

full_join(hourly_value, hourly_count, by = "Hour") %>% 
    replace_na(list(Successful = 0L, Failed = 0L, Value = 0)) %>% 
    left_join(interval_helper, by = "Hour") %>% 
    mutate(Count = Successful + Failed) %>% 
    select(Interval, Value, Count, Successful, Failed)

我不确定您原来的 Date 列的格式。在这里,我假设它是一个字符串。因为Interval 列的确切格式对您来说很重要,所以使用您想要显示的字符串创建和连接一个单独的小标题似乎更容易。

用零替换缺少的NA 值很重要,否则Count = Successful + Failed 在只有其中一个存在时会秘密失败。

【讨论】:

  • 谢谢,Micheal 有没有办法让我有时间是上午/下午我知道有一个使用胶水的 Fstring 但我尝试了几个选项来安排时间,有没有办法做到这一点?
  • 请看我更新的答案。
猜你喜欢
  • 1970-01-01
  • 2016-10-04
  • 2011-12-20
  • 1970-01-01
  • 2011-02-08
  • 2019-06-28
  • 2018-10-12
  • 1970-01-01
  • 2021-01-03
相关资源
最近更新 更多