【发布时间】:2017-09-25 08:38:01
【问题描述】:
我有一个问题,我有一个数据框列表,其中数据框的每一列在第一行有一个名称,在列的某些位置有 x-s。如果有 x,则第一行中的名称被视为已选择。 在现实世界的问题中,我读取了一个 xlsx 文件,其中包含许多工作表,其中每个工作表都包含一个大矩阵:每一列在第一行有一个名称,在一个有点稀疏的矩阵中有许多 x-s。每个工作表都成为数据框列表中的一个数据框。行名称包含一个与查找相关但与我的问题无关的标识符,如此处所述。
data1 <- data.frame(Col1 = c("Mark", "x", "", "x", "", ""),
Col2 = c("Paul", "", "", "", "x", ""),
Col3 = c("Jane", "", "", "", "", ""),
Col4 = c("Mary", "x", "x", "x", "", ""),
Col5 = c("Peter", "x", "x", "x", "", ""),
stringsAsFactors = FALSE)
data2 <- data.frame(Col1 = c("Mark", "x", "x", "", "", ""),
Col2 = c("Paul", "", "", "", "", ""),
Col3 = c("Jane", "", "", "", "", ""),
Col4 = c("Mary", "x", "", "x", "", ""),
Col5 = c("Peter", "x", "x", "", "", ""),
stringsAsFactors = FALSE)
data <- list(data1 = data1, data2 = data2)
列表中的每个数据框都具有以下结构(为方便起见显示为矩阵),其中列表中每个数据框的名称相同。只有 x-s 不同:
> as.matrix(data1)
Col1 Col2 Col3 Col4 Col5
[1,] "Mark" "Paul" "Jane" "Mary" "Peter"
[2,] "x" "" "" "x" "x"
[3,] "" "" "" "x" "x"
[4,] "x" "" "" "x" "x"
[5,] "" "x" "" "" ""
[6,] "" "" "" "" ""
如果列中有“x”,我想为列表中的每个数据框添加一列(“批准者”),该列是第 1 行中名称的串联:
Col1 Col2 Col3 Col4 Col5 Approvers
[1,] "Mark" "Paul" "Jane" "Mary" "Peter" ""
[2,] "x" "" "" "x" "x" "Mark; Mary; Peter"
[3,] "" "" "" "x" "x" "Mary; Peter"
[4,] "x" "" "" "x" "x" "Mark; Mary; Peter"
[5,] "" "x" "" "" "" "Paul"
[6,] "" "" "" "" "" ""
目前我分两步解决这个问题:
- 我创建另一个列表来保存每个 x 的列位置
- 在嵌套的 for 循环中,我查找第一行中的所有名称并将它们连接起来。
代码如下:
position <- lapply(data, function(x) apply(x, 1, function(y) which(y %in% "x")))
position <- lapply(position, function(x) lapply(x, function(y) {if (length(y) == 0L) return(0) else return(y)})) # remove int(0) and replace with 0
position <- lapply(position, function(x) lapply(x, function(x) paste(x, collapse = ","))) # flatten second level list into string
for (i in 1:length(data)) {
for (j in 1:nrow(data[[i]])) {
if (as.numeric(unlist(strsplit(position[[i]][[j]], ",")))[[1]] == 0) {
data[[i]][j, "Approvers"] <- ""
} else {
data[[i]][j, "Approvers"] <- paste(data[[i]][1, as.numeric(unlist(strsplit(position[[i]][[j]], ",")))], collapse = "; ")
}
}
}
对我来说,这很笨拙,我想通过同时遍历两个列表来使用 lapply 和 mapply 来做到这一点,但我不知道如何做到这一点。此外,创建位置对象并将 x-s 的列索引折叠成一个字符串并在循环中将它们分隔开来过于复杂。
【问题讨论】: