【发布时间】:2020-08-18 14:38:26
【问题描述】:
我正在处理 NoSQL 数据,我需要在 R 中进行数据透视。
样本数据:
structure(list(timestamp = structure(c(1595709882, 1595709882,
1595709931, 1595709931, 1595710021, 1595710023, 1595710023, 1595710027,
1595710157, 1595710157, 1595710277, 1595710277, 1595710337, 1595710337,
1595710397, 1595710397, 1595710457, 1595710457, 1595710517, 1595710517
), class = c("POSIXct", "POSIXt"), tzone = "UTC"), value = c("3000",
"160", "160", "3000", "6000", "6000", "160", "6000", "6000",
"160", "160", "6000", "6000", "160", "6000", "160", "6000", "160",
"6000", "160"), variable = c("ENGINE_RPM", "VEHICLE_SPEED", "VEHICLE_SPEED",
"ENGINE_RPM", "ENGINE_RPM", "ENGINE_RPM", "VEHICLE_SPEED", "ENGINE_RPM",
"ENGINE_RPM", "VEHICLE_SPEED", "VEHICLE_SPEED", "ENGINE_RPM",
"ENGINE_RPM", "VEHICLE_SPEED", "ENGINE_RPM", "VEHICLE_SPEED",
"ENGINE_RPM", "VEHICLE_SPEED", "ENGINE_RPM", "VEHICLE_SPEED")), row.names = c(NA,
-20L), class = c("tbl_df", "tbl", "data.frame"))
timestamp value variable
7/25/2020 20:44:42 3000 ENGINE_RPM
7/25/2020 20:44:42 160 VEHICLE_SPEED
7/25/2020 20:45:31 160 VEHICLE_SPEED
7/25/2020 20:45:31 3000 ENGINE_RPM
7/25/2020 20:47:01 6000 ENGINE_RPM
7/25/2020 20:47:03 6000 ENGINE_RPM
7/25/2020 20:47:03 160 VEHICLE_SPEED
7/25/2020 20:47:07 6000 ENGINE_RPM
7/25/2020 20:49:17 6000 ENGINE_RPM
7/25/2020 20:49:17 160 VEHICLE_SPEED
7/25/2020 20:51:17 160 VEHICLE_SPEED
7/25/2020 20:51:17 6000 ENGINE_RPM
7/25/2020 20:52:17 6000 ENGINE_RPM
7/25/2020 20:52:17 160 VEHICLE_SPEED
7/25/2020 20:53:17 6000 ENGINE_RPM
7/25/2020 20:53:17 160 VEHICLE_SPEED
7/25/2020 20:54:17 6000 ENGINE_RPM
7/25/2020 20:54:17 160 VEHICLE_SPEED
7/25/2020 20:55:17 6000 ENGINE_RPM
7/25/2020 20:55:17 160 VEHICLE_SPEED
如果我们查看示例数据,某些时间戳同时具有 RPM 和 SPEED,而很少有时间戳只有其中之一。
我需要那些具有 2 个时间戳的行,因为它们同时具有车速和 RPM,我稍后可以在特定时间旋转以查看车辆的速度和发动机的 RPM。
我正在查看的输出是:
timestamp ENGINE_RPM VEHICLE_SPEED
7/25/2020 20:44:42 3000 160
7/25/2020 20:45:31 3000 160
7/25/2020 20:47:03 6000 160
7/25/2020 20:49:17 6000 160
7/25/2020 20:51:17 6000 160
7/25/2020 20:52:17 6000 160
7/25/2020 20:53:17 6000 160
7/25/2020 20:54:17 6000 160
7/25/2020 20:55:17 6000 160
我使用的查询是:
data %>% group_by(timestamp, variable, value) %>%
mutate(row = row_number()) %>% filter(n() == 2) %>%
pivot_wider(names_from = variable, values_from = value) %>% select(-row)
我得到的输出是:
# A tibble: 8 x 3
# Groups: timestamp [4]
timestamp VEHICLE_SPEED ENGINE_RPM
<dttm> <chr> <chr>
1 2020-08-05 16:09:02 5 NA
2 2020-08-05 16:09:02 5 NA
3 2020-08-06 18:32:33 15 NA
4 2020-08-06 18:32:33 15 NA
5 2020-08-06 18:32:52 25 NA
6 2020-08-06 18:32:52 25 NA
7 2020-08-07 12:03:53 NA 1500
8 2020-08-07 12:03:53 NA 1500
>
有人可以告诉我如何获得所需的输出。
【问题讨论】: