【发布时间】:2021-02-09 20:57:19
【问题描述】:
目的是将数据帧(表示一对多关系:一台计算机与多台显示器)转换为更广泛的表示。
数据框(缩写)可以是:
library(tidyverse)
df <- tibble::tribble(
~CPU_ID, ~ID, ~CONFIGITEM_NUMBER, ~NAME, ~AllocationDate, ~Model, ~Vendor,
182434, 195251, 101142000825, "COMP000572", "2014-04-10", "HP ELITE DISPLAY E-231", "Hewlett-Packard",
182434, 405022, 1142027261, "COMP030500", "2020-12-02", "V173A", "ACER",
182436, 183607, 101142000008, "COMP000008", "2014-04-18", "HP ELITE DISPLAY E-231", "Hewlett-Packard",
182437, 228469, 1142006861, "COMP020117", "2018-03-05", "S22C45KBW", "Samsung",
182437, 341806, 1142019822, "COMP050244", "2019-01-09", "L1940T", "HP",
182438, 205930, 101142001009, "COMP050002", "2019-05-20", "S22C45KBW", "Samsung",
182439, 240546, 1142008622, "COMP050131", "2016-09-16", "SAMSUNG SYNCMASTER 943", "SAMSUNG",
182462, 184114, 101142000515, "COMP000515", "2019-08-27", "HP ELITE DISPLAY E-231", "Hewlett-Packard",
182463, 184113, 101142000514, "COMP000514", "2019-08-28", "HP ELITE DISPLAY E-231", "Hewlett-Packard",
182464, 184106, 101142000507, "COMP000507", "2019-08-27", "HP ELITE DISPLAY E-231", "Hewlett-Packard"
)
我可以通过以下方式正确旋转它:
df %>%
group_by(CPU_ID) %>%
filter(row_number() == 1) %>%
ungroup() %>%
rename_with( ~ paste0("monitor1_", .), .cols = !CPU_ID) %>%
left_join(
df %>%
group_by(CPU_ID) %>%
filter(row_number() == 2) %>%
ungroup() %>%
rename_with( ~ paste0("monitor2_", .), .cols = !CPU_ID),
by = "CPU_ID"
)
#> # A tibble: 8 x 13
#> CPU_ID monitor1_ID monitor1_CONFIG~ monitor1_NAME monitor1_Alloca~ monitor1_Model monitor1_Vendor
#> <dbl> <dbl> <dbl> <chr> <chr> <chr> <chr>
#> 1 182434 195251 101142000825 COMP000572 2014-04-10 HP ELITE DISP~ Hewlett-Packard
#> 2 182436 183607 101142000008 COMP000008 2014-04-18 HP ELITE DISP~ Hewlett-Packard
#> 3 182437 228469 1142006861 COMP020117 2018-03-05 S22C45KBW Samsung
#> 4 182438 205930 101142001009 COMP050002 2019-05-20 S22C45KBW Samsung
#> 5 182439 240546 1142008622 COMP050131 2016-09-16 SAMSUNG SYNCM~ SAMSUNG
#> 6 182462 184114 101142000515 COMP000515 2019-08-27 HP ELITE DISP~ Hewlett-Packard
#> 7 182463 184113 101142000514 COMP000514 2019-08-28 HP ELITE DISP~ Hewlett-Packard
#> 8 182464 184106 101142000507 COMP000507 2019-08-27 HP ELITE DISP~ Hewlett-Packard
#> # ... with 6 more variables: monitor2_ID <dbl>, monitor2_CONFIGITEM_NUMBER <dbl>,
#> # monitor2_NAME <chr>, monitor2_AllocationDate <chr>, monitor2_Model <chr>, monitor2_Vendor <chr>
但在实际数据帧中,有每台计算机有两个以上显示器的情况,所以这个公式需要很多 left_join。
我试图写一个替代方案,例如:
df %>%
group_by(CPU_ID) %>%
mutate(monitor_n = row_number()) %>%
ungroup() %>%
pivot_wider(
id_cols = CPU_ID,
names_from = monitor_n,
values_from = !CPU_ID
) %>%
select(-starts_with("monitor_n")) %>%
rename_with(function(colname)
str_replace(colname, "^(.*)_(\\d)$", "monitor\\2_\\1"),
.cols = !CPU_ID)
#> # A tibble: 8 x 13
#> CPU_ID monitor1_ID monitor2_ID monitor1_CONFIG~ monitor2_CONFIG~ monitor1_NAME monitor2_NAME
#> <dbl> <dbl> <dbl> <dbl> <dbl> <chr> <chr>
#> 1 182434 195251 405022 101142000825 1142027261 COMP000572 COMP030500
#> 2 182436 183607 NA 101142000008 NA COMP000008 <NA>
#> 3 182437 228469 341806 1142006861 1142019822 COMP020117 COMP050244
#> 4 182438 205930 NA 101142001009 NA COMP050002 <NA>
#> 5 182439 240546 NA 1142008622 NA COMP050131 <NA>
#> 6 182462 184114 NA 101142000515 NA COMP000515 <NA>
#> 7 182463 184113 NA 101142000514 NA COMP000514 <NA>
#> 8 182464 184106 NA 101142000507 NA COMP000507 <NA>
#> # ... with 6 more variables: monitor1_AllocationDate <chr>, monitor2_AllocationDate <chr>,
#> # monitor1_Model <chr>, monitor2_Model <chr>, monitor1_Vendor <chr>, monitor2_Vendor <chr>
但我需要按照与原始数据框相同的顺序维护列。
您能推荐其他更简单(更整洁)的替代方案吗?
【问题讨论】: