【问题标题】:Create a 24 hour vector with 60 and 1 minutes time interval in R在 R 中创建一个 24 小时向量,时间间隔为 60 分钟和 1 分钟
【发布时间】:2018-05-06 15:30:47
【问题描述】:

我有一个防火墙日志文件,其中包括日期、小时、src_address、dest_address 和 Date.time。我想为每次(例如;从 2018/01/01 到 2018/05/06)创建一个 24 小时向量,时间间隔为 60 分钟和 1 分钟。然后在这些时间间隔中,我想找到一对 src_address 和 dest_address 的出现。最后,每对 src_address 和 dest_address 的这些外观最多。这是我的文件;

                date     hour    src_address  dest_address           Date.Time

1996  2018-04-14 08:24:01    1.11.201.19 172.16.16.100 2018-04-14 08:24:01
3702  2018-04-15 12:10:27    1.119.43.90 172.16.16.100 2018-04-15 12:10:27
1154  2018-04-14 00:59:27    1.119.43.90 172.16.16.153 2018-04-14 00:59:27
2414  2018-04-14 12:33:29    1.119.43.90 192.168.1.112 2018-04-14 12:33:29
18013 2018-04-28 18:49:05   1.171.43.133   172.16.16.5 2018-04-28 18:49:05
18015 2018-04-28 18:49:05   1.171.43.133   172.16.16.5 2018-04-28 18:49:05
6903  2018-04-25 21:31:52   1.179.191.82   172.16.16.5 2018-04-25 21:31:52
11741 2018-04-27 01:08:43   1.179.191.82 192.168.1.111 2018-04-27 01:08:43
11933 2018-04-27 02:00:10   1.179.191.82 192.168.1.111 2018-04-27 02:00:10
11023 2018-04-26 21:39:39   1.179.191.82 192.168.1.112 2018-04-26 21:39:39
11175 2018-04-26 22:31:01   1.179.191.82 192.168.1.112 2018-04-26 22:31:01
13073 2018-04-27 08:24:58   1.180.72.186 172.16.16.153 2018-04-27 08:24:58
13735 2018-04-27 12:07:34   1.180.72.186 172.16.16.153 2018-04-27 12:07:34
2752  2018-04-14 19:34:53   1.202.165.40 172.16.16.153 2018-04-14 19:34:53
4046  2018-04-15 18:16:40    1.203.84.52   172.16.16.5 2018-04-15 18:16:40
4048  2018-04-15 18:18:43    1.203.84.52 192.168.1.112 2018-04-15 18:18:43
3020  2018-04-15 01:35:40    1.209.171.4 192.168.1.111 2018-04-15 01:35:40
4870  2018-04-16 05:33:42   1.214.34.114 172.16.16.100 2018-04-16 05:33:42
7025  2018-04-25 22:28:06   1.214.34.114 172.16.16.100 2018-04-25 22:28:06
4262  2018-04-15 23:31:56   1.214.34.114 172.16.16.153 2018-04-15 23:31:56
9369  2018-04-26 10:32:50   1.214.34.114 172.16.16.153 2018-04-26 10:32:50
2716  2018-04-14 18:49:30   1.214.34.114   172.16.16.5 2018-04-14 18:49:30
9563  2018-04-26 12:34:58   1.214.34.114   172.16.16.5 2018-04-26 12:34:58
1110  2018-04-14 00:27:02   1.214.34.114 192.168.1.111 2018-04-14 00:27:02
4470  2018-04-16 01:27:32   1.214.34.114 192.168.1.112 2018-04-16 01:27:32
9581  2018-04-26 12:55:39    1.55.249.92 172.16.16.153 2018-04-26 12:55:39
2970  2018-04-15 00:01:18    1.55.249.92   172.16.16.5 2018-04-15 00:01:18
15329 2018-04-27 21:53:16    1.55.249.92   172.16.16.5 2018-04-27 21:53:16
15537 2018-04-28 00:02:30    1.55.249.92   172.16.16.5 2018-04-28 00:02:30
19249 2018-04-29 06:28:04   1.71.188.254 172.16.16.100 2018-04-29 06:28:04
19243 2018-04-29 06:28:04   1.71.188.254 172.16.16.153 2018-04-29 06:28:04
19241 2018-04-29 06:28:04   1.71.188.254 172.16.16.159 2018-04-29 06:28:04
19239 2018-04-29 06:28:04   1.71.188.254   172.16.16.5 2018-04-29 06:28:04
19247 2018-04-29 06:28:04   1.71.188.254 192.168.1.111 2018-04-29 06:28:04
19245 2018-04-29 06:28:04   1.71.188.254 192.168.1.112 2018-04-29 06:28:04
6315  2018-04-25 18:56:08     1.85.18.88 172.16.16.153 2018-04-25 18:56:08
14623 2018-04-27 16:41:00     1.85.18.88 172.16.16.153 2018-04-27 16:41:00

这是我的期望;

   src_address  dest_address max(per hour) max(per minute)
2  1.11.201.19 172.16.16.100           1       1   
3  1.119.43.90 172.16.16.100           1       1   
4  1.119.43.90 172.16.16.153           1       1   
5  1.119.43.90 192.168.1.112           1       1   
6 1.171.43.133   172.16.16.5           2       2   

【问题讨论】:

    标签: r


    【解决方案1】:

    要获取摘要数据,需要做很多事情。可以使用dplyrtidyrlubridate 包来转换数据。

    方法:

    1. 通过合并日期和小时并转换为来创建DateTimeymd_hms
    2. src_addresdest_addressYear-Month-Day Hour 上的群组 计算每小时发生次数
    3. src_addresdest_addressYear-Month-Day Hour:Min 上分组以计算 > 每分钟出现次数
    4. src_addresdest_address 上进行分组并汇总以获得每小时和每分钟的最大发生次数
    library(dplyr)
    library(tidyr)
    library(lubridate)
    
    df %>% unite("DateTime", c("date","hour"), sep=" ") %>% 
      mutate(DateTime = ymd_hms(DateTime)) %>%
      group_by(src_addres, dest_address, YMD_H = format(DateTime, "%Y-%m-%d %H")) %>%
      mutate(HourlyAppearance = n()) %>%
      group_by(src_addres, dest_address, YMD_HM = format(DateTime, "%Y-%m-%d %H:%M")) %>%
      mutate(PerMinAppearance = n()) %>%
      group_by(src_addres, dest_address) %>%
      summarise( 'max(per hour)' = max(HourlyAppearance), 
               'max(per min)' = max(PerMinAppearance)) %>%
      as.data.frame()
    
    #      src_addres  dest_address max(per hour) max(per min)
    # 1   1.11.201.19 172.16.16.100             1            1
    # 2   1.119.43.90 172.16.16.100             1            1
    # 3   1.119.43.90 172.16.16.153             1            1
    # 4   1.119.43.90 192.168.1.112             1            1
    # 5  1.171.43.133   172.16.16.5             2            2
    # 6  1.179.191.82   172.16.16.5             1            1
    # 7  1.179.191.82 192.168.1.111             1            1
    # 8  1.179.191.82 192.168.1.112             1            1
    # 9  1.180.72.186 172.16.16.153             1            1
    # 10 1.202.165.40 172.16.16.153             1            1
    # 11  1.203.84.52   172.16.16.5             1            1
    # 12  1.203.84.52 192.168.1.112             1            1
    # 13  1.209.171.4 192.168.1.111             1            1
    # 14 1.214.34.114 172.16.16.100             1            1
    # 15 1.214.34.114 172.16.16.153             1            1
    # 16 1.214.34.114   172.16.16.5             1            1
    # 17 1.214.34.114 192.168.1.111             1            1
    # 18 1.214.34.114 192.168.1.112             1            1
    # 19  1.55.249.92 172.16.16.153             1            1
    # 20  1.55.249.92   172.16.16.5             1            1
    # 21 1.71.188.254 172.16.16.100             1            1
    # 22 1.71.188.254 172.16.16.153             1            1
    # 23 1.71.188.254 172.16.16.159             1            1
    # 24 1.71.188.254   172.16.16.5             1            1
    # 25 1.71.188.254 192.168.1.111             1            1
    # 26 1.71.188.254 192.168.1.112             1            1
    # 27   1.85.18.88 172.16.16.153             1            1
    

    数据:

    OP 没有以非常简单的格式提供数据。包含日期和时间列使其变得更加困难。也许这就是对这个问题的低响应的原因。我更喜欢分别阅读datetime 部分,然后阅读unite 获得Date/Time

    strtext <- "Sl  date hour  src_addres  dest_address  Date_t   Time_t
    1996  2018-04-14 08:24:01    1.11.201.19 172.16.16.100 2018-04-14 08:24:01
    3702  2018-04-15 12:10:27    1.119.43.90 172.16.16.100 2018-04-15 12:10:27
    1154  2018-04-14 00:59:27    1.119.43.90 172.16.16.153 2018-04-14 00:59:27
    2414  2018-04-14 12:33:29    1.119.43.90 192.168.1.112 2018-04-14 12:33:29
    18013 2018-04-28 18:49:05   1.171.43.133   172.16.16.5 2018-04-28 18:49:05
    18015 2018-04-28 18:49:05   1.171.43.133   172.16.16.5 2018-04-28 18:49:05
    6903  2018-04-25 21:31:52   1.179.191.82   172.16.16.5 2018-04-25 21:31:52
    11741 2018-04-27 01:08:43   1.179.191.82 192.168.1.111 2018-04-27 01:08:43
    11933 2018-04-27 02:00:10   1.179.191.82 192.168.1.111 2018-04-27 02:00:10
    11023 2018-04-26 21:39:39   1.179.191.82 192.168.1.112 2018-04-26 21:39:39
    11175 2018-04-26 22:31:01   1.179.191.82 192.168.1.112 2018-04-26 22:31:01
    13073 2018-04-27 08:24:58   1.180.72.186 172.16.16.153 2018-04-27 08:24:58
    13735 2018-04-27 12:07:34   1.180.72.186 172.16.16.153 2018-04-27 12:07:34
    2752  2018-04-14 19:34:53   1.202.165.40 172.16.16.153 2018-04-14 19:34:53
    4046  2018-04-15 18:16:40    1.203.84.52   172.16.16.5 2018-04-15 18:16:40
    4048  2018-04-15 18:18:43    1.203.84.52 192.168.1.112 2018-04-15 18:18:43
    3020  2018-04-15 01:35:40    1.209.171.4 192.168.1.111 2018-04-15 01:35:40
    4870  2018-04-16 05:33:42   1.214.34.114 172.16.16.100 2018-04-16 05:33:42
    7025  2018-04-25 22:28:06   1.214.34.114 172.16.16.100 2018-04-25 22:28:06
    4262  2018-04-15 23:31:56   1.214.34.114 172.16.16.153 2018-04-15 23:31:56
    9369  2018-04-26 10:32:50   1.214.34.114 172.16.16.153 2018-04-26 10:32:50
    2716  2018-04-14 18:49:30   1.214.34.114   172.16.16.5 2018-04-14 18:49:30
    9563  2018-04-26 12:34:58   1.214.34.114   172.16.16.5 2018-04-26 12:34:58
    1110  2018-04-14 00:27:02   1.214.34.114 192.168.1.111 2018-04-14 00:27:02
    4470  2018-04-16 01:27:32   1.214.34.114 192.168.1.112 2018-04-16 01:27:32
    9581  2018-04-26 12:55:39    1.55.249.92 172.16.16.153 2018-04-26 12:55:39
    2970  2018-04-15 00:01:18    1.55.249.92   172.16.16.5 2018-04-15 00:01:18
    15329 2018-04-27 21:53:16    1.55.249.92   172.16.16.5 2018-04-27 21:53:16
    15537 2018-04-28 00:02:30    1.55.249.92   172.16.16.5 2018-04-28 00:02:30
    19249 2018-04-29 06:28:04   1.71.188.254 172.16.16.100 2018-04-29 06:28:04
    19243 2018-04-29 06:28:04   1.71.188.254 172.16.16.153 2018-04-29 06:28:04
    19241 2018-04-29 06:28:04   1.71.188.254 172.16.16.159 2018-04-29 06:28:04
    19239 2018-04-29 06:28:04   1.71.188.254   172.16.16.5 2018-04-29 06:28:04
    19247 2018-04-29 06:28:04   1.71.188.254 192.168.1.111 2018-04-29 06:28:04
    19245 2018-04-29 06:28:04   1.71.188.254 192.168.1.112 2018-04-29 06:28:04
    6315  2018-04-25 18:56:08     1.85.18.88 172.16.16.153 2018-04-25 18:56:08
    14623 2018-04-27 16:41:00     1.85.18.88 172.16.16.153 2018-04-27 16:41:00"
    
    df <- read.table(text = strtext,header = TRUE, stringsAsFactors = FALSE)
    

    【讨论】:

    • @Hüseyin 让我们做一件事。您只需要dplyrtidyrlubridate。用这 3 个包试试我的答案。
    猜你喜欢
    • 2014-11-04
    • 2013-06-28
    • 2016-08-18
    • 1970-01-01
    • 2022-01-25
    • 2014-07-28
    • 1970-01-01
    • 2012-07-23
    • 1970-01-01
    相关资源
    最近更新 更多