您的图像中的结果无法从所示的输入数据计算,但我假设这是因为 Excel 中的复制粘贴错误。你最可能想要的是
split(data, f = list(data$cat1, data$cat2), drop = TRUE)
您还可以使用dplyr::group_indices() 作为拆分变量来获得(轻微的)速度增益,但会牺牲列表元素的好听名称:
data('diamonds', package = 'ggplot2')
# base
spl_1 <- split(diamonds,
f = list(diamonds$cut, diamonds$color, diamonds$clarity),
sep = '-', drop = TRUE)
# dplyr
spl_2 <- split(diamonds, dplyr::group_indices(diamonds, cut, color, clarity))
microbenchmark::microbenchmark(
"base" = split(diamonds,
f = list(diamonds$cut, diamonds$color, diamonds$clarity),
sep = '-', drop = TRUE),
"dplyr" = split(diamonds, dplyr::group_indices(diamonds, cut, color, clarity))
)
Unit: milliseconds
expr min lq mean median uq max neval
base 20.0393 21.03635 31.81306 23.96895 25.2412 718.0278 100
dplyr 14.5076 15.07760 16.54695 15.73990 16.9229 24.3292 100
但是,如果您将拆分数据帧写入多个 CSV,则拥有漂亮的列表元素名称可以更轻松地编写适当的文件名,例如
# don't run this unless you want ~300 CSV's in your working dir!
mapply(function(dat, nm) {
write.csv(dat, file.path(getwd(), paste0(nm, '.csv')))
},
dat = spl_1, nm = names(spl_1))
如果您使用 dplyr 按组索引进行拆分,则必须手动将名称添加到输出列表中,例如
names(spl_2) <- sapply(spl_2, function(x)
paste0(x$cut[1], '-', x$color[1], '-', x$clarity[1]))
在写入文件之前,这可能会消除任何速度增益。