【发布时间】:2015-08-02 12:02:27
【问题描述】:
我有一个数据集,其中包含 14 个相互排斥的调用类型类别,全部编码为虚拟变量。这是一个小样本:
dput(df)
structure(list(MON1_12 = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L), WEEK1_53 = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), AGENT_ID = structure(c(3L,
4L, 7L, 8L, 1L, 6L, 5L, 9L, 2L, 10L), .Label = c("A129", "A360",
"A407", "B891", "D197", "L145", "L722", "O518", "T443", "W764"
), class = "factor"), CallsHandled = c(1L, 4L, 2L, 14L, 1L, 2L,
5L, 1L, 1L, 3L), CONTENT = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L), CLAIMS = c(1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L),
CREDIT_CARD = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L),
DEDUCT_BILL = c(0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L),
HCREFORM = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L)), .Names = c("MON1_12",
"WEEK1_53", "AGENT_ID", "CallsHandled", "CONTENT", "CLAIMS",
"CREDIT_CARD", "DEDUCT_BILL", "HCREFORM"), class = "data.frame", row.names = c(NA,
-10L))
我想将每个虚拟变量组合成一个名为“QUEUE”的新变量,该变量将“1”的值替换为相应虚拟变量的名称。这是一个示例:
dput(df2)
structure(list(MON1_12 = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L), WEEK1_53 = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), AGENT_ID = structure(c(3L,
4L, 7L, 8L, 1L, 6L, 5L, 9L, 2L, 10L), .Label = c("A129", "A360",
"A407", "B891", "D197", "L145", "L722", "O518", "T443", "W764"
), class = "factor"), CallsHandled = c(1L, 4L, 2L, 14L, 1L, 2L,
5L, 1L, 1L, 3L), QUEUE = structure(c(1L, 4L, 2L, 4L, 1L, 3L,
3L, 5L, 5L, 4L), .Label = c("CLAIMS", "CONTENT", "CREDIT_CARD",
"DEDUCT_BILL", "HCREFORM"), class = "factor")), .Names = c("MON1_12",
"WEEK1_53", "AGENT_ID", "CallsHandled", "QUEUE"), class = "data.frame", row.names = c(NA,
-10L))
针对已标记的问题进行编辑:这是我今天下午根据建议尝试的方法,示例数据框略有不同:
df$Queue <- as.factor(df$CONTENT + df$CLAIMS*2 + df$CREDIT_CARD*3 + df$DEDUCT_BILL*4 + df$HCREFORM*5)
levels(df$Queue) <- c("CONTENT", "CLAIMS", "CREDIT_CARD","DEDUCT_BILL","HCREFORM")
View(df)
但我在 Queue 列中收到一列 NA。所以,我在这里重新创建了另一个示例数据集。这个数据框充分代表了我在现实中将收到的内容,除了我将有大约 40 个变量和 200 万行。当我在上面的“df”上运行我上面尝试的内容时,我得到以下不正确的结果:
dput(df)
structure(list(MON1_12 = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L), WEEK1_53 = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), AGENT_ID = structure(c(3L,
4L, 7L, 8L, 1L, 6L, 5L, 9L, 2L, 10L), .Label = c("A129", "A360",
"A407", "B891", "D197", "L145", "L722", "O518", "T443", "W764"
), class = "factor"), CallsHandled = c(1L, 4L, 2L, 14L, 1L, 2L,
5L, 1L, 1L, 3L), CONTENT = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L), CLAIMS = c(1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L),
CREDIT_CARD = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L),
DEDUCT_BILL = c(0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L),
HCREFORM = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), Queue = structure(c(2L,
1L, 1L, 3L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("CONTENT",
"CLAIMS", "CREDIT_CARD", "DEDUCT_BILL", "HCREFORM"), class = "factor")), .Names = c("MON1_12",
"WEEK1_53", "AGENT_ID", "CallsHandled", "CONTENT", "CLAIMS",
"CREDIT_CARD", "DEDUCT_BILL", "HCREFORM", "Queue"), row.names = c(NA,
-10L), class = "data.frame")
我也试过了:
df3 <- cbind(df[1:4], QUEUE = apply(df[5:9], 1, function(N) names(N)[as.logical(N)]))
但收到以下错误:“data.frame 中的错误(“CLAIMS”,字符(0),字符(0),“DEDUCT_BILL”,: 参数暗示不同的行数:1、0:
【问题讨论】:
-
到目前为止你尝试了什么?请分享任何尝试(编辑您的问题)。
标签: r