在R中将列堆叠成1列[重复]答案

【问题标题】：stacking columns into 1 column in R [duplicate]在R中将列堆叠成1列[重复]
【发布时间】：2013-04-01 10:09:52
【问题描述】：

我有一个看起来像这样的数据框：

ID Time U1 U2 U3 U4 ...
1  20    1  2 3  5 .. 
2  20    2  5 9  4 ..
3  20    2  5 6  4 ..
.
.

And I would need to keep it like: 

ID Time  U
1  20    1
1  20    2
1  20    3
1  20    5
2  20    2
2  20    5
2  20    9
2  20    4
3  20    2
3  20    5
3  20    6
3  20    4

我试过了：

X <- read.table("mydata.txt", header=TRUE, sep=",")
X_D <- as.data.frame(X)
X_new <- stack(X_D, select = -c(ID, Time))

但我还没有设法将数据放入该表格中。老实说，我在堆叠/转置方面经验不足，因此非常感谢您的帮助！

【问题讨论】：

这通常称为从宽格式到长格式的转换。但是，按照您定义它的方式，您会丢失有关数据来自哪一列的信息。顺便说一句，包reshape2 涵盖了这种转换。
此外，如果您为回答者提供一种非常简单的方法将示例数据导入 R 以使用，您更有可能获得您真正想要的答案。提供文件中的内容并不简单，而是提供原始示例数据框对象 IS 的 dput() 输出！
还有tidyr::gather()的解决方案
@Phil - 你能列举出来吗？我正在寻找与 JMP“堆栈列”平台等效的 tidyr。

标签： r dataframe rows reshape

【解决方案1】：

这是stack 方法：

dat2a <- data.frame(dat[1:2], stack(dat[3:ncol(dat)]))
dat2a
#    ID Time values ind
# 1   1   20      1  U1
# 2   2   20      2  U1
# 3   3   20      2  U1
# 4   1   20      2  U2
# 5   2   20      5  U2
# 6   3   20      5  U2
# 7   1   20      3  U3
# 8   2   20      9  U3
# 9   3   20      6  U3
# 10  1   20      5  U4
# 11  2   20      4  U4
# 12  3   20      4  U4

这与“reshape2”中的melt 非常相似：

library(reshape2)
dat2b <- melt(dat, id.vars=1:2)
dat2b
#    ID Time variable value
# 1   1   20       U1     1
# 2   2   20       U1     2
# 3   3   20       U1     2
# 4   1   20       U2     2
# 5   2   20       U2     5
# 6   3   20       U2     5
# 7   1   20       U3     3
# 8   2   20       U3     9
# 9   3   20       U3     6
# 10  1   20       U4     5
# 11  2   20       U4     4
# 12  3   20       U4     4

而且，与@TylerRinker 的答案非常相似，但没有删除“时间”，只是使用sep = "" 来帮助 R 猜测时间和变量名称。

dat3 <- reshape(dat, direction = "long", idvar=1:2, 
                varying=3:ncol(dat), sep = "", timevar="Measure")
dat3
#        ID Time Measure U
# 1.20.1  1   20       1 1
# 2.20.1  2   20       1 2
# 3.20.1  3   20       1 2
# 1.20.2  1   20       2 2
# 2.20.2  2   20       2 5
# 3.20.2  3   20       2 5
# 1.20.3  1   20       3 3
# 2.20.3  2   20       3 9
# 3.20.3  3   20       3 6
# 1.20.4  1   20       4 5
# 2.20.4  2   20       4 4
# 3.20.4  3   20       4 4

在所有这三个中，您最终会得到四列，而不是三列，就像您在所需输出中描述的那样。但是，正如@ndoogan 指出的那样，这样做会丢失有关数据的信息。如果您对此感到满意，您始终可以很容易地从生成的data.frame 中删除该列（例如，dat2a <- dat2a[-4]。

【讨论】：

【解决方案2】：

带基reshape：

dat <- read.table(text="ID Time U1 U2 U3 U4
1  20    1  2 3  5
2  20    2  5 9  4
3  20    2  5 6  4", header=TRUE)


colnames(dat) <- gsub("([a-zA-Z]*)([0-9])", "\\1.\\2", colnames(dat))
reshape(dat, varying=3:ncol(dat), v.names="U", direction ="long", timevar = "Time", 
    idvar = "ID")

【讨论】：

(+1) 在另一个依赖于 reshape2 包的 reshape 答案的上下文中，可能值得指定“base reshape”的含义。
这对任何想知道的人都应该这样做：?reshape
+1 用于精确匹配输出，但我也担心这样会丢失信息。
@AnandaMahto 同意。在这种情况下，我不认为这是一个问题，所以我放弃了它以满足用户的需求。

【解决方案3】：

你也可以使用melt()：

library(reshape2)

new_data <- melt(old_data, id.vars=c("ID","Time"),
    value.name = "U")

然后删除“变量”列：

new_data$variable <- NULL

【讨论】：

糟糕，这实际上会在 old_data 中逐行排序 U 行

【解决方案4】：

试试这个：

do.call(rbind, lapply(1:4, function(i)structure(dat[,c("ID", "Time", paste0("U",i))], names=c("ID", "Time", "U"))))

dat 是您的 data.frame...

【讨论】：

谢谢！它给了我一个错误： '未定义的列选择' 这里 '[.data.frame'(dat, ,[,c("ID", "Time",
嗨@user2263330，它适用于dat <- data.frame(ID=1:3, Time=20, U1=1:3, U2=4:6, U3=7:9, U4=10:12)。您的 data.frame 的名称是什么？