【问题标题】:Splitting data.frame inside and outside an R function在 R 函数内部和外部拆分 data.frame
【发布时间】:2026-02-15 07:10:01
【问题描述】:

我有 3 个数据帧(AB1B2)。我 split 每个变量 study.name 得到我的期望输出 显示为out1, out2, out3:

J <- split(A, A$study.name);      out1 <- do.call(rbind, c(J, make.row.names = F))
M <- split(B1, B1$study.name);    out2 <- do.call(rbind, c(M, make.row.names = F))
N <- split(B2, B2$study.name);    out3 <- do.call(rbind, c(N, make.row.names = F))

但我想知道为什么我的函数foo 无法实现相同的输出? (见下文)

 A <- read.csv("https://raw.githubusercontent.com/izeh/m/master/irr.csv", h = T)  ## data A
B1 <- read.csv('https://raw.githubusercontent.com/izeh/m/master/irr2.csv', h = T) ## data B1
B2 <- read.csv("https://raw.githubusercontent.com/izeh/m/master/irr4.csv", h = T) ## data B2

 foo <- function(...){      ## The unsuccessful function `foo`

    r <- list(...)

 ## r <- Can we HERE delete rows and columns that are ALL `NA` or EMPTY in `r`?

    J <- unlist(lapply(seq_along(r), function(i) split(r[[i]], r[[i]]$study.name)), recursive = FALSE)

    lapply(seq_along(J), function(i)do.call(rbind, c(J[[i]], make.row.names = FALSE)) )
}

foo(B1, B2) # Example without success

【问题讨论】:

    标签: r list function loops dataframe


    【解决方案1】:

    我们可以在执行split之前对行/列进行清理

    foo <- function(...){  
        r <- list(...)
    
        lapply(r, function(dat) {
    
           m1 <- is.na(dat)|dat == ""
          i1 <- rowSums(m1) < ncol(m1)
          j1 <- colSums(m1) < nrow(m1)
          dat1 <- dat[i1, j1]
          facColumns <- sapply(dat1, is.factor)
          dat1[facColumns] <- lapply(dat1[facColumns], as.character)
          dat1$study.name <- factor(dat1$study.name, levels = unique(dat1$study.name))  
          l1 <- split(dat1, dat1$study.name)
    
    
              do.call(rbind, c(l1, make.row.names = FALSE))
    
         }
    
        )
    
    
    }
    
    lapply(foo(B1, B2), head, 2)
    #[[1]]
    #  study.name group.name outcome ESL prof scope type
    #1 Shin.Ellis   ME.short       1   1    2     1    1
    #2 Shin.Ellis    ME.long       1   1    2     1    1
    
    #[[2]]
    #  study.name group.name outcome ESL prof scope type
    #1 Shin.Ellis   ME.short       1   1    2     1    1
    #2 Shin.Ellis    ME.long       1   1    2     1    1
    

    或使用单个对象作为参数

    lapply(foo(A), head, 2)
    #[[1]]
    #  study.name group.name outcome ESL prof scope type ESL.1 prof.1 scope.1 type.1
    #1 Shin.Ellis   ME.short       1   1    2     1    1     1      2       1      1
    #2 Shin.Ellis    ME.long       1   1    2     1    1     1      2       1      1
    

    【讨论】:

    • 没问题,我完全理解。不要让这让你失望。你很特别!另外我觉得other question有点不一样,我给你投两票。