将长数据集部分折叠成更宽的数据集答案

【问题标题】：Partially collapse a long dataset into a wider dataset将长数据集部分折叠成更宽的数据集
【发布时间】：2021-05-18 16:35:58
【问题描述】：

下面的数据框“dat”是由多项研究的结果组成的更大数据集的子集。研究由“srdr_id”索引。每项研究可能有不止一个“结果”。对于每个“结果”，研究有两个治疗组（arm_number 1 和 2，组名在“arm”中），结果在两个时间点评估，例如时间点 0 或 3。

 dat <- structure(list(srdr_id = c("172600", "172600", "172600", "172600", 
    "172600", "172600", "172600", "172600"), arm_number = c(1L, 2L, 
    1L, 2L, 1L, 2L, 1L, 2L), arm = c("Fluoxetine_CBT_MI", "Placebo_CBT_MI", 
    "Fluoxetine_CBT_MI", "Placebo_CBT_MI", "Fluoxetine_CBT_MI", "Placebo_CBT_MI", 
    "Fluoxetine_CBT_MI", "Placebo_CBT_MI"), outcome = c("alcohol use days", 
    "alcohol use days", "alcohol use days", "alcohol use days", "BDI", 
    "BDI", "BDI", "BDI"), timepoint = c("0", "0", "3", "3", "0", 
    "0", "3", "3"), timepoint_units = c("months", "months", "months", 
    "months", "months", "months", "months", "months"), n = c(24, 
    26, 24, 26, 24, 26, 24, 26), mean = c(3.01, 3.19, 1.88, 1.87, 
    17.25, 22.12, 6.79, 10.46), sd = c(1.75, 1.35, 1.52, 1.43, 8.87, 
    7.5, 7.49, 10.8)), row.names = c(NA, -8L), class = c("tbl_df", 
    "tbl", "data.frame"))

我的目标是创建一个“宽”数据集，如下所示，其中“pre”是 timepoint = 0 的平均值、sd 和 n 值，“post”是对应的 timepoint = 3 值。

我考虑使用 group_split(srdr_ooutcome) 创建列表列，然后在列表列的每个元素上使用 pivot_wider()。感谢有关 tidyverse 或基本 R 方法的建议。

【问题讨论】：

标签： r tidyr

【解决方案1】：

我们可以分组然后pivot_wider

library(tidyverse)
dat %>%
  group_by(srdr_id, timepoint) %>% 
  pivot_wider(
    names_from= timepoint,
    values_from = c(mean, sd, n)
  ) %>% 
  setNames(gsub("_0", "_pre", names(.))) %>% 
  setNames(gsub("_3", "_post", names(.))) %>% 
  select(srdr_id, outcome, arm_number, arm, mean_pre, 
         mean_post, sd_pre, sd_post, n_pre, n_post)

输出：

  srdr_id outcome          arm_number arm               mean_pre mean_post sd_pre sd_post n_pre n_post
  <chr>   <chr>                 <int> <chr>                <dbl>     <dbl>  <dbl>   <dbl> <dbl>  <dbl>
1 172600  alcohol use days          1 Fluoxetine_CBT_MI     3.01      1.88   1.75    1.52    24     24
2 172600  alcohol use days          2 Placebo_CBT_MI        3.19      1.87   1.35    1.43    26     26
3 172600  BDI                       1 Fluoxetine_CBT_MI    17.2       6.79   8.87    7.49    24     24
4 172600  BDI                       2 Placebo_CBT_MI       22.1      10.5    7.5    10.8     26     26

【讨论】：

这回答了一项研究的问题，有两个结果。如何在列表列的每个元素上运行您的代码，每个研究 (srdr_id) 都有一个单独的数据框，即 group_split(srdr_id, 结果) 的结果？
无需进行 group_split(srdr_id)。如果前面带有：'group_by(srdr_id)，则上面的代码适用于多项研究。
完美。请看我的更新：这行得通吗？