【发布时间】:2021-04-09 21:23:17
【问题描述】:
下面给出了一个数据框DF和一个列表mappingList。
DF <- data.frame(
"colors number 3 former" = c("r","r","?","l","?","r","?","?","r","?"),
"music number 3 latter" = c("r","l","r","l","r","r","l","l","r","l"),
"genres number 3 latter" = c("l","r","?","l","?","r","?","l","l","r"),
"genres number 12 former" = c("r","r","?","l","l","r","l","?","r","?"),
"music number 12 latter" = c("r","l","?","l","?","r","l","l","r","?"),
"fabric number 12 latter" = c("l","r","?","l","r","r","r","l","l","r"),
"colors number 12 latter" = c("r","r","?","r","?","r","?","r","r","?"),
check.names = FALSE
)
mappingList <- list("number 3",
"genres",
"music",
"number 12",
"music",
"fabric",
"colors")
在DF中,当一列以former结尾并包含值“?”时,需要从以latter结尾的列编码。通过编码,我的意思是,former 列中的? 值需要填充其对应的latter 列中的任何值。 former 列可以有多个 latter 列。从mappingList 中找到former 列的对应latter 列。例如对于colors number 3 former,mappingList 中有 2 个列指示符:genres 和 music,因为它们在 number 3 下,colors number 3 former 属于并包含子字符串 number 3。在 for 循环中 colors number 3 former 应该首先从 genres number 3 latter 编码,对于具有值 ? 的行。如果former 列中仍然存在?,则应使用第二个选项进行映射,即“音乐编号 3 later(the next element under genres in number 3). The loop should stop if there are no more ?left in theformercolumn, if not it should move down in themappingList` 用于该数字。原始数据帧大得多,所以手动映射不是首选。预期的输出是:
expectedDF <- data.frame(
"colors number 3 former" = c("r","r","r","l","r","r","l","l","r","r"),
"music number 3 latter" = c("r","l","r","l","r","r","l","l","r","l"),
"genres number 3 latter" = c("l","r","?","l","?","r","?","l","l","r"),
"genres number 12 former" = c("r","r","?","l","l","r","l","l","r","r"),
"music number 12 latter" = c("r","l","?","l","?","r","l","l","r","?"),
"fabric number 12 latter" = c("l","r","?","l","r","r","r","l","l","r"),
"colors number 12 latter" = c("r","r","?","r","?","r","?","r","r","?"),
check.names = FALSE
)
我用嵌套循环尝试了这种方法,但是一旦循环到达下一个数字,我就找不到停止循环的方法:
# Take columns with that end with "former"
# Populate former columns in columnsToBeEncoded
columnsToBeEncoded <- list()
for(col in names(DF)){
if(grepl("former", col)){
columnsToBeEncoded <- append(columnsToBeEncoded, col)
}
}
#columnsToBeEncoded
# Encode "former" columns where row is "?" from "latter" columns by the order in mappingList
for(col in columnsToBeEncoded){
# extract column number from former column
colNumber <- paste(strsplit(col, " ")[[1]][2:3], collapse = " ")
# Find indices where former column has "?"
j <- which(DF[, col] == "?")
for(element in mappingList){
# I think the if statement below is not working
# Inside the if statement I see elements with "number" in it are involved too
if(!grepl(colNumber, element)){
elementNameinColumnForm <- paste(c(element, colNumber, "latter"), collapse = " ")
print(elementNameinColumnForm)
DF[j,col] <- DF[j,elementNameinColumnForm]
}
}
}
【问题讨论】: