【发布时间】:2014-11-25 02:31:00
【问题描述】:
我有几个数据框,我想遍历它们并删除其中 NA 超过 90% 的一些列和行。我也玩过 lapply 但我无法让它工作......
我当前的代码是:
data_a_2007 <- read.csv(path)
data_a_2008 <- read.csv(path)
datasets_a <- list(data_a_2007, data_a_2008)
for(dataset in datasets_a) {
columns_to_delete <- NULL
rows_to_delete <- NULL
# find columns threshold
threshold_columns <- floor(nrow(dataset)*0.1)
# find columns to delete
valuecount_columns <- colSums(!is.na(dataset))
columns_to_delete <- sort(which(valuecount_columns < threshold_columns), decreasing = TRUE)
# find rows threshold
threshold_rows <- floor(ncol(dataset)*0.1)
# find rows to delete
valuecount_rows <- rowSums(!is.na(dataset))
rows_to_delete <- sort(which(valuecount_rows < threshold_rows), decreasing = TRUE)
# delete columns with less than x values
for(column_id in columns_to_delete) {
dataset[column_id] <- NULL
}
# delete rows with less than x values
for (row in rows_to_delete) {
dataset <- dataset[-row,]
}
}
【问题讨论】:
标签: r for-loop dataframe iteration lapply