【问题标题】:Rearranging dataframes in R [duplicate]在R中重新排列数据帧[重复]
【发布时间】:2013-02-25 01:11:59
【问题描述】:

我正在尝试(有效地)重新排列 R 中的数据框。

我的数据是从两个参与者群体(1 或 0,即疾病组和对照组)的四个不同实验中收集的实验数据。

示例数据框:

Subject type    Experiment 1    Experiment 2    Experiment 3    Experiment 4
           0             4.6             2.5             1.4             5.3
           0             4.7             2.4             1.8             5.1
           1             3.5             1.2             5.6             7.5
           1             3.8             1.7             6.2             8.1

我想重新排列我的数据框,使其结构如下(原因是,当它们在 R 中这样的结构时,它使我更容易在数据上运行函数):

Subject type    Experiment    Measure
           0             1        4.6
           0             2        2.5
           0             3        1.4
           0             4        5.3
           0             1        4.7
           0             2        2.4
           0             3        1.8
           0             4        5.1
           1             1        3.5
           1             2        1.2
           1             3        5.6
           1             4        7.5
           1             1        3.8
           1             2        1.7
           1             3        6.2
           1             4        8.1

如您所见,现在每个主题占据了四行;现在,每一行都与单个测量有关,而不是单个主题。这(至少现在)对我来说更方便插入 R 函数。也许到时候我会想办法完全跳过这一步,但我是 R 新手,这似乎是最好的做事方式。

无论如何 - 问题是,进行这种数据帧转换的最有效方法是什么?目前我正在这样做:

# Input dframe1
dframe1 <- structure(list(subject_type = c(0L, 0L, 1L, 1L), experiment_1 = c(4.6, 
4.7, 3.5, 3.8), experiment_2 = c(2.5, 2.4, 1.2, 1.7), experiment_3 = c(1.4, 
1.8, 5.6, 6.2), experiment_4 = c(5.3, 5.1, 7.5, 8.1)), .Names = c("subject_type", 
"experiment_1", "experiment_2", "experiment_3", "experiment_4"
), class = "data.frame", row.names = c(NA, -4L))

# Create a matrix
temporary_matrix <- matrix(ncol=3, nrow=nrow(dframe1) * 4)
colnames(temporary_matrix) <- c("subject_type","experiment","measure")

# Rearrange dframe1 so that a different measure is in each column
for(i in 1:nrow(dframe1)) {
  temporary_matrix[i*4-3,"subject_type"] <- dframe1$subject_type[i]
  temporary_matrix[i*4-3,"experiment"] <- 1
  temporary_matrix[i*4-3,"measure"] <- dframe1$experiment_1[i]
  temporary_matrix[i*4-2,"subject_type"] <- dframe1$subject_type[i]
  temporary_matrix[i*4-2,"experiment"] <- 2
  temporary_matrix[i*4-2,"measure"] <- dframe1$experiment_2[i]
  temporary_matrix[i*4-1,"subject_type"] <- dframe1$subject_type[i]
  temporary_matrix[i*4-1,"experiment"] <- 3
  temporary_matrix[i*4-1,"measure"] <- dframe1$experiment_3[i]
  temporary_matrix[i*4-0,"subject_type"] <- dframe1$subject_type[i]
  temporary_matrix[i*4-0,"experiment"] <- 4
  temporary_matrix[i*4-0,"measure"] <- dframe1$experiment_4[i]
}

# Convert matrix to a data frame
dframe2 <- data.frame(temporary_matrix)

# NOTE: For some reason, this has to be converted back into a double (at some point above it becomes a factor)
dframe2$measure <- as.double(as.character(dframe2$measure))

当然有更好的方法吗?!

【问题讨论】:

  • 查看reshape2包或者base R中的reshape函数。
  • +1!感谢这个完整的问题,它包含:1-您想做什么 2-您尝试了什么 3 一个可重复的示例。

标签: r matrix dataframe reshape


【解决方案1】:

使用reshape2 包,这非常简单。

library(reshape2)

# assuming your data.frame is called `dat`
melt(dat, id.vars=c("Subject type"))

如果你愿意,你可以让它变得更有趣:

newdat <- melt(dat, id.vars=c("Subject type"), variable.name="Experiment", value.name="Measure")

# remove "experiment " from the names, and convert to numeric
newdat$Experiment <- as.numeric(gsub("Experiment\\s*", "", as.character(newdat$Experiment)))

【讨论】:

    【解决方案2】:

    基础reshape方法:

    获取数据:

    dframe1 <- structure(list(subject_type = c(0L, 0L, 1L, 1L), experiment_1 = c(4.6, 
    4.7, 3.5, 3.8), experiment_2 = c(2.5, 2.4, 1.2, 1.7), experiment_3 = c(1.4, 
    1.8, 5.6, 6.2), experiment_4 = c(5.3, 5.1, 7.5, 8.1)), .Names = c("subject_type", 
    "experiment_1", "experiment_2", "experiment_3", "experiment_4"
    ), class = "data.frame", row.names = c(NA, -4L))
    

    将变量设置为堆栈:

    expandvars <- paste('experiment',1:4,sep='_')
    

    改头换面!

    dfrm1res <- reshape(
                       dframe1,
                       idvar="subject_type",
                       varying=list(expandvars),
                       v.names=c("value"),
                       direction="long",
                       new.row.names=1:16
                        )
    

    结果:

    > dfrm1res
       subject_type time value
    1             0    1   4.6
    2             0    1   4.7
    3             1    1   3.5
    4             1    1   3.8
    5             0    2   2.5
    6             0    2   2.4
    7             1    2   1.2
    8             1    2   1.7
    9             0    3   1.4
    10            0    3   1.8
    11            1    3   5.6
    12            1    3   6.2
    13            0    4   5.3
    14            0    4   5.1
    15            1    4   7.5
    16            1    4   8.1
    

    【讨论】:

    • +1 用于 reshpae 解决方案!我可以建议expandvars &lt;- paste('experiment',1:4,sep='_')
    • @agstudy - 你可以,你有,我已经编辑了。
    【解决方案3】:
    data.frame(subject_type=dframe1$subject_type, stack(dframe1[2:5] )  )
       subject_type values          ind
    1             0    4.6 experiment_1
    2             0    4.7 experiment_1
    3             1    3.5 experiment_1
    4             1    3.8 experiment_1
    5             0    2.5 experiment_2
    6             0    2.4 experiment_2
    7             1    1.2 experiment_2
    8             1    1.7 experiment_2
    9             0    1.4 experiment_3
    10            0    1.8 experiment_3
    11            1    5.6 experiment_3
    12            1    6.2 experiment_3
    13            0    5.3 experiment_4
    14            0    5.1 experiment_4
    15            1    7.5 experiment_4
    16            1    8.1 experiment_4
    

    或者使用 base reshape(虽然我的偏好似乎与 thelatemal 的使用不同。):

    dframe1$subject=1:4
    reshape(dframe1, direction="long", idvar=c("subject_type", "subject"),
                     varying=2:5, sep="_", v.names="exp_value")
     #--------------------------
          subject_type subject time exp_value
    0.1.1            0       1    1       4.6
    0.2.1            0       2    1       4.7
    1.3.1            1       3    1       3.5
    1.4.1            1       4    1       3.8
    0.1.2            0       1    2       2.5
    0.2.2            0       2    2       2.4
    1.3.2            1       3    2       1.2
    1.4.2            1       4    2       1.7
    0.1.3            0       1    3       1.4
    0.2.3            0       2    3       1.8
    1.3.3            1       3    3       5.6
    1.4.3            1       4    3       6.2
    0.1.4            0       1    4       5.3
    0.2.4            0       2    4       5.1
    1.3.4            1       3    4       7.5
    1.4.4            1       4    4       8.1
    

    【讨论】:

    • +1!哇!堆栈和回收的好用处!
    • 我认为这就是stack的初衷。
    猜你喜欢
    • 1970-01-01
    • 2018-11-27
    • 2015-10-12
    • 2013-07-03
    • 2012-10-01
    • 1970-01-01
    • 2018-07-18
    • 2012-07-20
    相关资源
    最近更新 更多