【问题标题】:Assigning datasets to a list of dataframes将数据集分配给数据框列表
【发布时间】:2021-12-04 09:11:33
【问题描述】:

我每次通过一个迭代超过 4 年(2011、2013、2015、2017)的循环生成一组 6 个数据集,这样我总共将拥有 24 个数据集。我正在尝试使用分配粘贴将每个数据集的名称与相应的年份连接起来。 但是,我在循环结束时只得到 6 个数据集,而不是 6*4 =24。

我需要特殊的 [[]] 语法来创建数据框列表吗?为什么我无法将数据集分配给下面循环结构内的变量?

library(educationdata)
library(glue)

## Initialize lists
dates<-list("2011","2013","2015","2017")
frames<-list("df_ccdirectory","df_ccdenrollment","df_crdcteacher",
             "df_crdcmathscience","df_crdcsat","df_crdcfinance")
dflist <- list()



  for (j in dates){
    
    df_ccdirectory <- get_education_data(level = "schools",
                                source = "ccd",
                                topic = "directory",
                                filters = list(year = j,fips=10),
                                add_labels = TRUE)
    dflist[[1]]<- df_ccdirectory

    df_ccdenrollment <- get_education_data(level = "schools",
                                      source = "ccd",
                                      topic = "enrollment",
                                      filters = list(year = j,fips=10),
                                      add_labels = TRUE)
    dflist[[2]]<-   df_ccdenrollment
    df_crdcteacher<- get_education_data(level = "schools",
                           source = "crdc",
                           topic = "teachers-staff",
                           filters = list(year = j,fips=10),
                           add_labels = TRUE)
    dflist[[3]]<-    df_crdcteacher
    df_crdcmathscience <- get_education_data(level = "schools",
                                         source = "crdc",
                                         topic = "math-and-science",
                                         subtopic = c('race','sex'),
                                         filters = list(year = j,fips=10),
                                         add_labels = TRUE)
    dflist[[4]]<- df_crdcmathscience

    df_crdcsat <- get_education_data(level = "schools",
                           source = "crdc",
                           topic = "sat-act-participation",
                           subtopic = c('race','sex'),
                           filters = list(year = j,fips=10),
                           add_labels = TRUE)
    dflist[[5]] <-df_crdcsat

    df_crdcfinance <- get_education_data(level = "schools",
                                     source = "crdc",
                                     topic = "school-finance",
                                     filters = list(year = j,fips=10),
                                     add_labels = TRUE)
    dflist[[6]]<-df_crdcfinance

    
  
    ## Error catching...
    #print(dates[[j]],"\n")
    print(paste0("dataset 1"))
    cat("\n")
        head(dflist[[1]])
    cat("\n")
    print(paste0("dataset 6"))
    cat("\n")
    head(dflist[[6]])
    cat("\n")
    for (k in 1:6){
        assign(paste(frames[k], dates[j], sep = ""), dflist[[k]])
  
    }
  
 
 }
     

【问题讨论】:

  • 更改了代码,因此外循环迭代了多年。还是不行?指数还关闭吗?
  • 是的,当您在循环的第一次迭代中写入dflist[[1]] 时,它会写入dflist 的第一个元素。第二轮它只是覆盖它。也许尝试将所有dflist[[1]]dflist[[2]] 等更改为dflist[[length(dflist) + 1]]。这样,你总是写到列表的末尾

标签: r list dataframe for-loop


【解决方案1】:

考虑几个调整:

  1. 继续使用单个数据框列表,避免在您的全球环境中充斥许多独立的、类似结构的数据。对于调试和副作用问题,assign 应该很少在 R 中使用。相反,为您的数据框列表指定名称。
  2. 避免使用更精简的 apply 系列方法记录 for 循环,这些方法隐藏循环并返回集合,在某些情况下,如下所示,sapply 命名集合。此外,Mapmapply 的包装)是家庭的元素成员。
  3. 通过参数化六个不同的get_education_data 调用来保持代码干燥(D不要R重复Y我们自己)通过 sourcetopic 参数。

调整后的代码

# USER-DEFINED PARAMETERIZED METHOD
build_df <- function(year_param, source_param, topic_param) {
    get_education_data(
        level = "schools",
        source = source_param,
        topic = topic_param,
        filters = list(year = year_param, fips=10),
        add_labels = TRUE
    )
}

# INITIALIZE VECTORS
dates <- c("2011", "2013", "2015", "2017")
sources <- c("ccd", "ccd", "crdc", "crdc", "crdc", "crdc")
topics <- c(
    "directory", "enrollment", "teachers-staff",
    "math-and-science", "sat-act-participation",
    "school-finance"
)
frames <- c(
    "df_ccdirectory", "df_ccdenrollment", "df_crdcteacher",
    "df_crdcmathscience", "df_crdcsat", "df_crdcfinance"
)

# RETURN NESTED YEARLY LIST OF DATA FRAMES
df_list <- sapply(
    dates, 
    function(dt) setNames(Map(build_df, dt, sources, topics), frames),
    simplify = FALSE
)

输出

# ALL DATA FRAMES (N=24)
df_list$`2011`$df_ccdirectory
df_list$`2011`$df_ccdenrollment
...
df_list$`2017`$df_crdcsat
df_list$`2017`$df_crdcfinance


# ALL 2011 DATA FRAMES (N=6)
df_list$`2011`


# ALL ccdirectory DATA FRAMES (N=4)
lapply(df_list, `[[`, "df_ccdirectory")

【讨论】:

  • 感谢,不仅是具体的解决方案,还特别感谢您在开始时提出的广泛的代码建议 - 这些听起来在所有情况下都是非常好的原则。
  • 太棒了!很高兴听到并乐于提供帮助。编码愉快!
猜你喜欢
  • 1970-01-01
  • 2020-09-30
  • 1970-01-01
  • 1970-01-01
  • 2016-05-20
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 2014-12-04
相关资源
最近更新 更多