【发布时间】:2021-05-18 16:35:58
【问题描述】:
下面的数据框“dat”是由多项研究的结果组成的更大数据集的子集。研究由“srdr_id”索引。每项研究可能有不止一个“结果”。对于每个“结果”,研究有两个治疗组(arm_number 1 和 2,组名在“arm”中),结果在两个时间点评估,例如时间点 0 或 3。
dat <- structure(list(srdr_id = c("172600", "172600", "172600", "172600",
"172600", "172600", "172600", "172600"), arm_number = c(1L, 2L,
1L, 2L, 1L, 2L, 1L, 2L), arm = c("Fluoxetine_CBT_MI", "Placebo_CBT_MI",
"Fluoxetine_CBT_MI", "Placebo_CBT_MI", "Fluoxetine_CBT_MI", "Placebo_CBT_MI",
"Fluoxetine_CBT_MI", "Placebo_CBT_MI"), outcome = c("alcohol use days",
"alcohol use days", "alcohol use days", "alcohol use days", "BDI",
"BDI", "BDI", "BDI"), timepoint = c("0", "0", "3", "3", "0",
"0", "3", "3"), timepoint_units = c("months", "months", "months",
"months", "months", "months", "months", "months"), n = c(24,
26, 24, 26, 24, 26, 24, 26), mean = c(3.01, 3.19, 1.88, 1.87,
17.25, 22.12, 6.79, 10.46), sd = c(1.75, 1.35, 1.52, 1.43, 8.87,
7.5, 7.49, 10.8)), row.names = c(NA, -8L), class = c("tbl_df",
"tbl", "data.frame"))
我的目标是创建一个“宽”数据集,如下所示,其中“pre”是 timepoint = 0 的平均值、sd 和 n 值,“post”是对应的 timepoint = 3 值。
我考虑使用 group_split(srdr_ooutcome) 创建列表列,然后在列表列的每个元素上使用 pivot_wider()。感谢有关 tidyverse 或基本 R 方法的建议。
【问题讨论】: