分组然后展开以添加新列[重复]答案

【问题标题】：group by and then expand to add new columns [duplicate]分组然后展开以添加新列[重复]
【发布时间】：2021-10-11 11:19:10
【问题描述】：

我的数据类似于：

patientid <- c(100,101,101,101,102,102)
weight <- c(1,1,2,3,1,2)
height <- c(0,6,0,0,0,1)
bmi <- c(0,5,0,0,0,1)

我想对患者 ID 进行分组，以便数据框中每行只有 1 位患者。
然后将其他行作为附加列（通过在末尾添加一个数字来命名）。因此数据框将是 patientid、weight1、height1、bmi1、weight2、height2、bmi2 等。列数将对应于有多少重复的患者 id。

我假设 group_by 和 spread 是关键功能，但我无法弄清楚。在此示例中，患者 ID 为 101 的行将仅在 height1、bmi1 和 weight1 列中具有值，患者 101 将在 weight1、height1、bmi1、weight2、height2、bmi2、weight3、height3、bmi3 中具有值，而患者 102 将具有值在 weight1, height1, bmi1, weight2, height2, bmi2.

【问题讨论】：

标签： r dataframe reshape data-cleaning

【解决方案1】：

或许，我们可以在通过'patientid'创建序列列后使用pivot_wider

library(tidyr)
library(data.table)
library(dplyr)
df1 %>% 
    mutate(rn  = rowid(patientid)) %>% 
    pivot_wider(names_from = rn, values_from = c(weight, height, bmi),
         names_sep="")

输出：

# A tibble: 3 x 10
  patientid weight1 weight2 weight3 height1 height2 height3  bmi1  bmi2  bmi3
      <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl> <dbl> <dbl> <dbl>
1       100       1      NA      NA       0      NA      NA     0    NA    NA
2       101       1       2       3       6       0       0     5     0     0
3       102       1       2      NA       0       1      NA     0     1    NA

数据：

df1 <- data.frame(patientid, weight, height, bmi)

【讨论】：

【解决方案2】：

使用ave + reshape 的基本 R 选项

reshape(
  transform(
    df,
    q = ave(patientid, patientid, FUN = seq_along)
  ),
  direction = "wide",
  idvar = "patientid",
  timevar = "q"
)

给予

  patientid weight.1 height.1 bmi.1 weight.2 height.2 bmi.2 weight.3 height.3
1       100        1        0     0       NA       NA    NA       NA       NA
2       101        1        6     5        2        0     0        3        0
5       102        1        0     0        2        1     1       NA       NA
  bmi.3
1    NA
2     0
5    NA

【讨论】：

【解决方案3】：

我认为，group_by 和 spread 将成为 tidyverse 的一部分。

我使用 base reshape 对您的数据进行了重构，并使用 weight 作为测量 id。


patientid <- c(100,101,101,101,102,102)
weight <- c(1,1,2,3,1,2)
height <- c(0,6,0,0,0,1)
bmi <- c(0,5,0,0,0,1)

cat("data\n")
df <- data.frame(patientid = patientid,
                 n = weight,
                 weight = weight,
                 height = height,
                 bmi = bmi)
df

cat("reshaped to wid format\n")
reshape(data = df,
        idvar = "patientid",
        timevar = "n",
        # c("weight", "height", "bmi"),
        direction = "wide")

#?reshape()

【讨论】：

Hernando 请添加来自 Sololearn 的 Lisa 的功劳。你复制了那里的问题 abr repasted answer here。