我们可以将gather转换成'long'格式,然后将'value'用分隔符,和separate_rows分开,得到频率用count,spread转换成'wide'格式
library(tidyverse)
gather(test) %>%
separate_rows(value) %>%
count(key, value) %>%
spread(value, n, fill = 0) %>%
column_to_rownames('key')
# AAAA ABCD BBBB BBBC
#AK 0 2 0 0
#AZ 2 2 2 0
#NJ 1 0 0 1
注意:如果我们需要'long'格式的输出,则不需要spread
gather(test) %>%
separate_rows(value) %>%
count(key, value)
# A tibble: 6 x 3
# key value n
# <chr> <chr> <int>
#1 AK ABCD 2
#2 AZ AAAA 2
#3 AZ ABCD 2
#4 AZ BBBB 2
#5 NJ AAAA 1
#6 NJ BBBC 1
更新
如果我们还需要按“行”分组,创建一个row_number() 列,然后将gather 转换为“长”格式和count 上的united 'key' 和'rn'专栏
test %>%
mutate(rn = row_number()) %>%
gather(key, val, -rn) %>%
separate_rows(val) %>%
unite(key, key, rn) %>%
count(key, val) %>%
spread(val, n, fill = 0) %>%
column_to_rownames('key')
# AAAA ABCD BBBB BBBC
#AK_1 0 1 0 0
#AK_2 0 1 0 0
#AZ_1 1 1 1 0
#AZ_2 1 1 1 0
#NJ_1 0 0 0 1
#NJ_2 1 0 0 0
或使用base R
table(stack(lapply(test, function(x) unlist(strsplit(as.character(x), ", "))))[2:1])