如何合并多个变量以在 R 中创建一个新的因子变量？答案

【问题标题】：How to merge several variables to create a new factor variable in R?如何合并多个变量以在 R 中创建一个新的因子变量？
【发布时间】：2012-07-22 21:50:13
【问题描述】：

我有来自一项调查的数据。它来自一个看起来像这样的问题：

Did you do any of the following activities during your PhD

                             Yes, paid by my school. Yes, paid by me.  No. 

Attended an internationl conference?
Bought textbooks?

数据以这种方式自动保存在电子表格中：

id conf.1 conf.2 conf.3 text.1 text.2 text.3

1    1                              1
2           1               1
3                   1       1
4                   1                    1
5

这意味着参与者 1 参加了由她的大学支付的会议；参加者2参加了他支付的会议，参加者3没有去。

我想在单个变量中合并 conf.1、conf.2 和 conf.3 以及 text.1、text.2 和 text.3

id new.conf new.text

1   1        2
2   2        1
3   3        1
4   3        3

where the number now respresents the categories of the survey question

Thanks for your help

【问题讨论】：

那是重塑而不是合并。试试reshape（基础R）、reshapeasy（taRifx 包）或reshape2 包。

标签： r variables merge

【解决方案1】：

您没有说明每组问题是否可以有多个答案。如果是这样，这种方法可能不适合您。如果是这种情况，我建议在继续之前将您的问题更多reproducible。抛开这个警告，试一试：

library(reshape2)
#recreate your data
dat <- data.frame(id = 1:5,
                  conf.1 = c(1,rep(NA,4)),
                  conf.2 = c(NA,1, rep(NA,3)),
                  conf.3 = c(NA,NA,1,1, NA),
                  text.1 = c(NA,1,1,NA,NA),
                  text.2 = c(1, rep(NA,4)),
                  text.3 = c(rep(NA,3),1, NA))

#melt into long format
dat.m <- melt(dat, id.vars = "id")
#Split on the "."
dat.m[, c("variable", "val")] <- with(dat.m, colsplit(variable, "\\.", c("variable", "val")))
#Subset out only the complete cases
dat.m <- dat.m[complete.cases(dat.m),]
#Cast back into wide format
dcast(id ~ variable, value.var = "val", data = dat.m)
#-----
  id conf text
1  1    1    2
2  2    2    1
3  3    3    1
4  4    3    3

【讨论】：

谢谢大家的回答。

【解决方案2】：

这是一个处理缺失值的基本方法：

confvars <- c("conf.1","conf.2","conf.3")
textvars <- c("text.1","text.2","text.3")

which.sub <- function(x) {
maxsub <- apply(dat[x],1,which.max)
maxsub[(lapply(maxsub,length)==0)] <- NA
return(unlist(maxsub))
}

data.frame(
id = dat$id,
conf = which.sub(confvars),
text = which.sub(textvars)
)

结果：

  id conf text
1  1    1    2
2  2    2    1
3  3    3    1
4  4    3    3
5  5   NA   NA

【讨论】：

谢谢。我还有一个问题：是否可以将重新调整的表格转换为 Latex 表格，显示每个级别的名称（例如 1=由我的机构赞助；2=由不同的机构赞助；3=否）

【解决方案3】：

以下解决方案非常简单，我经常使用它。让我们使用上面 Chase 所做的相同数据帧。

dat <- data.frame(id = 1:5,
                  conf.1 = c(1,rep(NA,4)),
                  conf.2 = c(NA,1, rep(NA,3)),
                  conf.3 = c(NA,NA,1,1, NA),
                  text.1 = c(NA,1,1,NA,NA),
                  text.2 = c(1, rep(NA,4)),
                  text.3 = c(rep(NA,3),1, NA))

现在我们首先用零替换 NA。

dat[is.na(dat)] <- 0

将每一列乘以不同的数字可以让我们简单地计算新变量。

dat <- transform(dat, conf=conf.1 + 2*conf.2 + 3*conf.3,
                      text=text.1 + 2*text.2 + 3*text.3)

让我们将新变量（或此处用于整个数据集）中的零重新编码为 NA，然后就完成了。

dat[dat == 0] <- NA 

> dat
  id conf.1 conf.2 conf.3 text.1 text.2 text.3 conf text
1  1      1     NA     NA     NA      1     NA    1    2
2  2     NA      1     NA      1     NA     NA    2    1
3  3     NA     NA      1      1     NA     NA    3    1
4  4     NA     NA      1     NA     NA      1    3    3
5  5     NA     NA     NA     NA     NA     NA   NA   NA

【讨论】：