一种方法是在 base R"
中使用
reshape
reshape(survey, direction="long", idvar="id",
varying=list(c("V1","V4","V7"), c("V2","V5","V8"), c("V3","V6","V9")),
v.names=c("Visit1", "Visit2", "Visit3"), timevar="visit_no")
id visit_no Visit1 Visit2 Visit3
240.1 240 1 0 0 0
220.1 220 1 0 0 0
160.1 160 1 1 0 0
240.2 240 2 0 0 0
220.2 220 2 1 0 0
160.2 160 2 0 0 0
240.3 240 3 0 0 0
220.3 220 3 0 0 0
160.3 160 3 0 0 0
如果你希望它按 id 排序,然后从 dplyr 添加arrange
%>% dplyr::arrange(id)
id visit_no Visit1 Visit2 Visit3
1 160 1 1 0 0
2 160 2 0 0 0
3 160 3 0 0 0
4 220 1 0 0 0
5 220 2 1 0 0
6 220 3 0 0 0
7 240 1 0 0 0
8 240 2 0 0 0
9 240 3 0 0 0
如果您的原始变量名称格式一致,那么 reshape 命令会更简单,因为它会正确地从名称中猜出时间。例如,
names(survey)[2:10] <- paste0(names(survey)[2:10], ".", rep(1:3, 3))
head(survey)
id V1.1 V2.2 V3.3 V4.1 V5.2 V6.3 V7.1 V8.2 V9.3
v1 240 0 0 0 0 0 0 0 0 0
v2 220 0 0 0 1 0 0 0 0 0
v3 160 1 0 0 0 0 0 0 0 0
reshape(survey, direction="long", idvar="id",
varying=2:10, # Can just give the indices now.
v.names=c("Visit1", "Visit2", "Visit3"), timevar="visit_no") %>%
arrange(id)
虽然时间格式一致,但原来的变量名不是,所以R无法猜测长格式的名称(Visit1,Visit2,Visit3),需要提供这些在v.names 参数中。
如果它们是格式一致,那么重塑就更简单了。
names(survey)[2:10] <- paste0("Visit", rep(1:3, each=3), ".", rep(1:3, 3))
head(survey)
id Visit1.1 Visit1.2 Visit1.3 Visit2.1 Visit2.2 Visit2.3 Visit3.1 Visit3.2 Visit3.3
v1 240 0 0 0 0 0 0 0 0 0
v2 220 0 0 0 1 0 0 0 0 0
v3 160 1 0 0 0 0 0 0 0 0
reshape(survey, direction="long", varying=2:10, timevar="visit_no") %>%
arrange(id)
tidyr 版本可能涉及两次重塑;一种是把所有东西都做成很长的形式,然后再把它恢复成更宽的形式(我称之为 1 step back, 2 steps forward 方法)。