【发布时间】:2021-04-30 10:30:40
【问题描述】:
在my previous question 之后,我在 R 中处理了大量数据帧,每个数据帧都有不同的列数。我想同化这些数据集,以便它们都具有相同数量的列和新添加列的 NA 值。我已经写了一个循环,但我不确定如何更新真实的数据帧。
first_df = data.frame(matrix(rnorm(20), nrow=10))
second_df = data.frame(matrix(rnorm(20), nrow=4))
third_df = data.frame(matrix(rnorm(20), nrow=5))
library(tidyverse)
min_max <- mget(ls(pattern = "_df")) %>%
map_dbl(ncol) %>%
enframe() %>%
arrange(value) %>%
slice(1, n())
min_max
# A tibble: 2 x 2
# name value
# <chr> <dbl>
#1 first_df 2
#2 second_df 5
diff <- setdiff(names(get(min_max$name[2])), names(get(min_max$name[1])))
for (col_name in diff)
# all dataframes whose names contain "_df"
for (df_index in 1:length(ls(pattern = "_df")))
{
# capturing the dataframe
data = get(ls(pattern = "_df")[df_index]);
if (!(col_name %in% names(data)))
{data[,col_name] <- NA}
# I don't know how to update the real datasets
# get(ls(pattern = "_df")[df_index]) <- data
}
【问题讨论】: