【问题标题】:aggregated data in compliance with the sequences in R符合 R 中序列的聚合数据
【发布时间】:2019-03-14 11:52:49
【问题描述】:

这部分数据

dat=structure(list(spent = structure(c(2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 3L, 
3L, 3L, 3L, 3L, 3L), .Label = c("29.74", "73.5", "73.71"), class = "factor"), 
    date = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
    1L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 2L, 2L, 2L, 
    2L, 2L, 2L), .Label = c("04.10.2018", "08.10.2018", "26.09.2018"
    ), class = "factor"), utc_time.y = structure(c(5L, 8L, 2L, 
    1L, 4L, 4L, 9L, 10L, 6L, 3L, 7L, 5L, 8L, 2L, 1L, 4L, 4L, 
    9L, 10L, 6L, 3L, 7L, 5L, 8L, 2L, 1L, 4L, 4L), .Label = c("01.10.2018 22:26", 
    "05.10.2018 22:34", "05.10.2018 22:35", "06.10.2018 13:43", 
    "07.10.2018 15:55", "30.09.2018 11:22", "30.09.2018 11:23", 
    "30.09.2018 12:00", "30.09.2018 12:23", "30.09.2018 18:12"
    ), class = "factor"), real = 501:528, id = c(238430441353501, 
    238430441353501, 238430441353501, 238430441353501, 238430441353501, 
    238430441353501, 238430441353501, 238430441353501, 238430441353501, 
    238430441353501, 238430441353501, 238430441353501, 238430441353501, 
    238430441353501, 238430441353501, 238430441353501, 238430441353501, 
    238430441353501, 238430441353501, 238430441353501, 238430441353501, 
    238430441353501, 238430441353501, 238430441353501, 238430441353501, 
    238430441353501, 238430441353501, 238430441353501)), .Names = c("spent", 
"date", "utc_time.y", "real", "id"), class = "data.frame", row.names = c(NA, 
-28L))

如何用它们制作一些序列。

  1. 按日期对每个 ID 分别汇总列支出 (= 1577)
  2. 分别为每个 id 按 utc_time.y 汇总列实数 (=14406)
  3. 如果聚合数据真实> 花费则此 ID 创建标志 1,否则为 0

I.E.输出 (id 是字符)

spent       date       utc_time.y real           id flag
1  73.50 04.10.2018 07.10.2018 15:55  501 2.384304e+14    1
2  73.50 04.10.2018 30.09.2018 12:00  502 2.384304e+14    1
3  73.50 04.10.2018 05.10.2018 22:34  503 2.384304e+14    1
4  73.50 04.10.2018 01.10.2018 22:26  504 2.384304e+14    1
5  73.50 04.10.2018 06.10.2018 13:43  505 2.384304e+14    1
6  73.50 04.10.2018 06.10.2018 13:43  506 2.384304e+14    1
7  73.50 04.10.2018 30.09.2018 12:23  507 2.384304e+14    1
8  73.50 04.10.2018 30.09.2018 18:12  508 2.384304e+14    1
9  73.50 04.10.2018 30.09.2018 11:22  509 2.384304e+14    1
10 73.50 04.10.2018 05.10.2018 22:35  510 2.384304e+14    1
11 73.50 04.10.2018 30.09.2018 11:23  511 2.384304e+14    1
12 29.74 26.09.2018 07.10.2018 15:55  512 2.384304e+14    1
13 29.74 26.09.2018 30.09.2018 12:00  513 2.384304e+14    1
14 29.74 26.09.2018 05.10.2018 22:34  514 2.384304e+14    1
15 29.74 26.09.2018 01.10.2018 22:26  515 2.384304e+14    1
16 29.74 26.09.2018 06.10.2018 13:43  516 2.384304e+14    1
17 29.74 26.09.2018 06.10.2018 13:43  517 2.384304e+14    1
18 29.74 26.09.2018 30.09.2018 12:23  518 2.384304e+14    1
19 29.74 26.09.2018 30.09.2018 18:12  519 2.384304e+14    1
20 29.74 26.09.2018 30.09.2018 11:22  520 2.384304e+14    1
21 29.74 26.09.2018 05.10.2018 22:35  521 2.384304e+14    1
22 29.74 26.09.2018 30.09.2018 11:23  522 2.384304e+14    1
23 73.71 08.10.2018 07.10.2018 15:55  523 2.384304e+14    1
24 73.71 08.10.2018 30.09.2018 12:00  524 2.384304e+14    1
25 73.71 08.10.2018 05.10.2018 22:34  525 2.384304e+14    1
26 73.71 08.10.2018 01.10.2018 22:26  526 2.384304e+14    1
27 73.71 08.10.2018 06.10.2018 13:43  527 2.384304e+14    1
28 73.71 08.10.2018 06.10.2018 13:43  528 2.384304e+14    1

【问题讨论】:

  • 您的要求不是很清楚。前两个想对不同的ids 应用操作,但是你的ids 都是一样的。此外,id 不是您所指出的字符,而是数字。我认为这是一个错误?

标签: r dplyr data.table


【解决方案1】:

您可能可以这样做:

setDT(dat)[, s1 := sum(spent), by=.(id, date)][, 
    s2 := sum(real), by=.(id, utc_time.y)][, 
        flag := +(s2 > s1)]

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 2016-05-18
    • 1970-01-01
    • 2011-07-24
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2019-03-23
    相关资源
    最近更新 更多