在 R 函数内部和外部拆分 data.frame答案

【问题标题】：Splitting data.frame inside and outside an R function在 R 函数内部和外部拆分 data.frame
【发布时间】：2026-02-15 07:10:01
【问题描述】：

我有 3 个数据帧（A、B1 和 B2）。我 split 每个变量 study.name 得到我的期望输出 显示为out1, out2, out3:

J <- split(A, A$study.name);      out1 <- do.call(rbind, c(J, make.row.names = F))
M <- split(B1, B1$study.name);    out2 <- do.call(rbind, c(M, make.row.names = F))
N <- split(B2, B2$study.name);    out3 <- do.call(rbind, c(N, make.row.names = F))

但我想知道为什么我的函数foo 无法实现相同的输出？（见下文）

 A <- read.csv("https://raw.githubusercontent.com/izeh/m/master/irr.csv", h = T)  ## data A
B1 <- read.csv('https://raw.githubusercontent.com/izeh/m/master/irr2.csv', h = T) ## data B1
B2 <- read.csv("https://raw.githubusercontent.com/izeh/m/master/irr4.csv", h = T) ## data B2

 foo <- function(...){      ## The unsuccessful function `foo`

    r <- list(...)

 ## r <- Can we HERE delete rows and columns that are ALL `NA` or EMPTY in `r`?

    J <- unlist(lapply(seq_along(r), function(i) split(r[[i]], r[[i]]$study.name)), recursive = FALSE)

    lapply(seq_along(J), function(i)do.call(rbind, c(J[[i]], make.row.names = FALSE)) )
}

foo(B1, B2) # Example without success

【问题讨论】：

标签： r list function loops dataframe

【解决方案1】：

我们可以在执行split之前对行/列进行清理

foo <- function(...){  
    r <- list(...)

    lapply(r, function(dat) {

       m1 <- is.na(dat)|dat == ""
      i1 <- rowSums(m1) < ncol(m1)
      j1 <- colSums(m1) < nrow(m1)
      dat1 <- dat[i1, j1]
      facColumns <- sapply(dat1, is.factor)
      dat1[facColumns] <- lapply(dat1[facColumns], as.character)
      dat1$study.name <- factor(dat1$study.name, levels = unique(dat1$study.name))  
      l1 <- split(dat1, dat1$study.name)


          do.call(rbind, c(l1, make.row.names = FALSE))

     }

    )


}

lapply(foo(B1, B2), head, 2)
#[[1]]
#  study.name group.name outcome ESL prof scope type
#1 Shin.Ellis   ME.short       1   1    2     1    1
#2 Shin.Ellis    ME.long       1   1    2     1    1

#[[2]]
#  study.name group.name outcome ESL prof scope type
#1 Shin.Ellis   ME.short       1   1    2     1    1
#2 Shin.Ellis    ME.long       1   1    2     1    1

或使用单个对象作为参数

lapply(foo(A), head, 2)
#[[1]]
#  study.name group.name outcome ESL prof scope type ESL.1 prof.1 scope.1 type.1
#1 Shin.Ellis   ME.short       1   1    2     1    1     1      2       1      1
#2 Shin.Ellis    ME.long       1   1    2     1    1     1      2       1      1

【讨论】：

没问题，我完全理解。不要让这让你失望。你很特别！另外我觉得other question有点不一样，我给你投两票。