数据前后重塑答案

【问题标题】：Reshaping before and after data数据前后重塑
【发布时间】：2015-08-02 06:04:36
【问题描述】：

以下数据是治疗前后一系列测试的一小部分。现在我的数据是这样的：

  Subject Var1 Var2 Var3 Var4
1   A-pre   25   27   23    0
2  A-post   25   26   25  120
3   B-pre   30   28   27  132
4  B-post   30   28   26  140

我需要像这样重塑它：

  Subject Var1.pre Var1.post Var2.pre Var2.post Var3.pre Var3.post Var4.pre Var4.post
1       A       25        25       27        26       23        25        0       120
2       B       30        30       28        28       27        26      132       140

我已经阅读了 SO 中的许多问题以及用于在 r 中处理数据的包的文档，例如 reshape2 等，但我找不到类似的东西。有任何想法吗？下面是复制第一个表的代码：

dat<-structure(list(Subject = structure(c(2L, 1L, 4L, 3L), .Label = c("A-post", 
"A-pre", "B-post", "B-pre"), class = "factor"), Var1 = c(25L, 
25L, 30L, 30L), Var2 = c(27L, 26L, 28L, 28L), Var3 = c(23L, 25L, 
27L, 26L), Var4 = c(0L, 120L, 132L, 140L)), .Names = c("Subject", 
"Var1", "Var2", "Var3", "Var4"), row.names = c(NA, -4L), class = "data.frame")

【问题讨论】：

标签： r

【解决方案1】：

您可以使用data.table 的开发版本中的dcast，即。 v1.9.5 使用tstrsplit 和split 作为“-”将“主题”列分成两部分。我们使用dcast 将“长”格式重塑为“宽”格式。 data.table 中的 dcast 函数可以采用多个 value.var 列，即“Var1”到“Var4”。

library(data.table)#v1.9.5+
#convert the data.frame to data.table with `setDT(dat)`
#split the 'Subject' column with tstrsplit and create two columns 
setDT(dat)[, c('Subject', 'New') :=tstrsplit(Subject, '-')]
#change the New column class to 'factor' and specify the levels in order
#so that while using dcast we get the 'pre' column before 'post'
dat[, New:= factor(New, levels=c('pre', 'post'))]
#reshape the dataset
dcast(dat, Subject~New, value.var=grep('^Var', names(dat), value=TRUE),sep=".")
#    Subject Var1.pre Var1.post Var2.pre Var2.post Var3.pre Var3.post Var4.pre
#1:       A       25        25       27        26       23        25        0
#2:       B       30        30       28        28       27        26      132
#   Var4.post
#1:       120
#2:       140

注意：安装devel版本的说明是here

使用dplyr/tidyr 的选项是通过separate 将“主题”列一分为二，使用gather、unite 的“Var”列将“宽”格式转换为“长”格式（即Var1到Var4）和'New'（'VarNew'）和spread'long'格式到'wide'。

library(dplyr)
library(tidyr)
dat %>% 
   separate(Subject, into=c('Subject', 'New')) %>% #split to two columns
   gather(Var, Val, Var1:Var4)%>% #change from wide to long. Similar to melt
   unite(VarNew, Var, New, sep=".") %>% #unite two columns to form a single
   spread(VarNew, Val)#change from 'long' to 'wide'

【讨论】：

我应该提到我（假设我）有一个错误：当 a) 从 Cran 加载包 b) 从 github 安装 c) 尝试加载开发包时出现内存大小问题。重启 R 解决了这个问题。
@SteliosKaniballidis 我也做类似的事情。我不知道背后的原因。