重组具有多种数据类型的大型数据框[重复]答案

【问题标题】：restructure large data frame with multiple data types [duplicate]重组具有多种数据类型的大型数据框[重复]
【发布时间】：2023-09-23 08:21:01
【问题描述】：

我正在努力使我的数据（xlsx 文件）具有正确的形状。我原来的数据库如下：

   patient when    age weight height watchID dateFrom           
   <chr>   <chr> <dbl> <dbl>   <dbl>   <dbl> <dttm>             
 1 T01     pre      82 83        174    2788 2017-07-24
 2 T02     pre      81 80        166    7309 2017-07-22 
 3 T02     post     67 91        163    7309 2017-10-26 
 4 T03     pre      68 91        172    5066 2017-07-26 
 5 T03     post     68 91        172    7220 2017-10-24

我想获得一个广泛的数据库，其中只有一个基于“何时”列的患者 ID。但是当我尝试重塑它时，我终于用“dcast”功能做到了这一点：

   patient age_post age_pre weight_post weight_pre height_post height_pre
   <chr>      <int>   <int>       <int>      <int>       <int>      <int>
 1 T01            0       1           0          1           0          1
 2 T02            1       1           1          1           1          1
 3 T03            1       1           1          1           1          1
 4 T04            0       1           0          1           0          1
 5 T05            1       0           1          0           1          0

它以某种方式将所有变量更改为 1 和 0。如何获得具有不同变量类型的类似数据库，其中“pre”和“post”附加到原始列？

这是我的代码（“HW”是上面提到的原始数据集）：

mdata <- melt(HW, id=c("patient","when"))
mdata$value <- as.numeric(as.character(mdata$value)) #I added this line to convert the column to numeric but it doesn't help
mdata2 <- dcast(mdata, patient~variable+when)

我也试过了：

mdata <- melt(HW, id=c("patient","when"))
mdata3 <- reshape(mdata, idvar='patient', timevar='when', direction='wide')

但后来我明白了：

   patient variable.pre value.pre variable.post value.post
   <chr>   <fct>        <chr>     <fct>         <chr>     
 1 T01     age          82        NA            NA        
 2 T02     age          81        age           67        
 3 T03     age          68        age           68        
 4 T04     age          81        NA            NA        
 5 T05     NA           NA        age           87

没有其他变量。

提前致谢。

【问题讨论】：

标签： r reshape reshape2 melt dcast

【解决方案1】：

这是你想要的吗？

library(tidyr)
df <- tibble(patient = c("T01","T02","T02","T03","T03"),
             when = c("pre","pre","post","pre","post"),
             age = c(82,81,67,68,68),
             weight = c(83,80,91,91,91),
             height = c(174,166,163,172,172),
             watchid = c(2788,7309,7309,5066,7220),
             datefrom = c("2017-07-24","2017-07-22","2017-10-26",
                          "2017-07-26","2017-10-24"))

df %>%
  pivot_wider(names_from = when,
              values_from = c(age,weight,height,watchid,datefrom))

A tibble: 3 x 11
  patient age_pre age_post weight_pre weight_post height_pre height_post watchid_pre watchid_post
  <chr>     <dbl>    <dbl>      <dbl>       <dbl>      <dbl>       <dbl>       <dbl>        <dbl>
1 T01          82       NA         83          NA        174          NA        2788           NA
2 T02          81       67         80          91        166         163        7309         7309
3 T03          68       68         91          91        172         172        5066         7220

【讨论】：

这看起来像我需要的。但是，如果我完全使用您的代码，它可以工作，但是，当我尝试使用具有 75 列的数据库时，它只会丢弃“when”列而不做任何其他事情。
这是因为您需要指定所有具有链接到变量“when”的值的列，以便修改它们。但是，您不需要键入所有列名。这应该可以解决问题。 ``` cols % pivot_wider(names_from = when, values_from = cols) ```