您可以使用dplyr 和tidyr:
library(dplyr)
library(tidyr)
df %>%
pivot_longer(starts_with("n_state")) %>%
drop_na() %>%
group_by(county, state) %>%
mutate(name=row_number()) %>%
pivot_wider(names_prefix="n_state_")
返回
county state n_state_1 n_state_2 n_state_3
<chr> <chr> <chr> <chr> <chr>
1 Autauga_County AL FL NA NA
2 Baldwin_County AL GA TN NA
3 Catron_County AL FL GA CA
这里发生了什么?
-
pivot_longer 采用 n_state_{n}-columns 并将它们折叠成两列:name-column 包含原始列名称(n_state_1、n_state_2 等),value-column 包含状态(@在许多情况下为 987654331@、GA 或 <NA>)。
- 接下来我们删除每个
<NA> 条目。 (注意:我使用<NA> 来表明它是一个NA-值。)
- 在按
county 和state 分组后,我们添加一个行号。这些数字稍后将用于创建新的列名。
-
pivot_wider 现在获取这些行号并在它们前面加上 n_state_ 以获取新列。这些值取自在第二行代码中创建的value 列。 pivot_wider 用 <NA>-values 填充缺失值(默认行为)。
数据
structure(list(county = c("Autauga_County", "Baldwin_County",
"Catron_County"), state = c("AL", "AL", "AL"), n_state_1 = c(NA,
"GA", "FL"), n_state_2 = c("FL", NA, "GA"), n_state_3 = c(NA,
"TN", NA), n_state_4 = c(NA, NA, "CA")), problems = structure(list(
row = 3L, col = "n_state_4", expected = "", actual = "embedded null",
file = "literal data"), row.names = c(NA, -1L), class = c("tbl_df",
"tbl", "data.frame")), class = c("spec_tbl_df", "tbl_df", "tbl",
"data.frame"), row.names = c(NA, -3L), spec = structure(list(
cols = list(county = structure(list(), class = c("collector_character",
"collector")), state = structure(list(), class = c("collector_character",
"collector")), n_state_1 = structure(list(), class = c("collector_character",
"collector")), n_state_2 = structure(list(), class = c("collector_character",
"collector")), n_state_3 = structure(list(), class = c("collector_character",
"collector")), n_state_4 = structure(list(), class = c("collector_character",
"collector"))), default = structure(list(), class = c("collector_guess",
"collector")), skip = 1L), class = "col_spec"))