【问题标题】:Apply a function to each factor level in a list of data frames将函数应用于数据框列表中的每个因子级别
【发布时间】:2019-07-12 07:14:21
【问题描述】:

我有多个级别的因子 racegroup 的数据框,下面的最小示例:

   id     race group
1   1    White     1
2   2    White     1
3   3    White     1
4   4    White     1
5   5    White     1
6   6    White     2
7   7    White     2
8   8    White     2
9   9    White     2
10 10    Black     1
11 11    Black     1
12 12    Black     1
13 13    Black     2
14 14    Black     2
15 15    Black     2
16 16    Black     2
17 17 Hispanic     1
18 18 Hispanic     1
19 19 Hispanic     1
20 20 Hispanic     1
21 21 Hispanic     1
22 22 Hispanic     2
23 23 Hispanic     2
24 24 Hispanic     2
25 25 Hispanic     2

我可以使用"White" 对每个race 级别分组的单个数据框进行子集化,然后使用以下函数将数据按group 拆分。

filter.race <- function(x, y) { f <- subset(x, race == "White" | race == y)
    f <- split(f, f$group)
    f} 

返回:

filter.race(df, "Black")

$`1`
   id  race group
1   1 White     1
2   2 White     1
3   3 White     1
4   4 White     1
5   5 White     1
10 10 Black     1
11 11 Black     1
12 12 Black     1

$`2`
   id  race group
6   6 White     2
7   7 White     2
8   8 White     2
9   9 White     2
13 13 Black     2
14 14 Black     2
15 15 Black     2
16 16 Black     2
filter.race(df, "Hispanic")

$`1`
   id     race group
1   1    White     1
2   2    White     1
3   3    White     1
4   4    White     1
5   5    White     1
17 17 Hispanic     1
18 18 Hispanic     1
19 19 Hispanic     1
20 20 Hispanic     1
21 21 Hispanic     1

$`2`
   id     race group
6   6    White     2
7   7    White     2
8   8    White     2
9   9    White     2
22 22 Hispanic     2
23 23 Hispanic     2
24 24 Hispanic     2
25 25 Hispanic     2

但是,我正在尝试找到一种方法将此函数应用于数据帧的所有级别,而不是多次单独指定 y

样本数据:

dput(df)
structure(list(id = 1:25, race = structure(c(3L, 3L, 3L, 3L, 
3L, 3L, 3L, 3L, 3L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L), .Label = c("Black", "Hispanic", "White"), class = "factor"), 
    group = c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 
    2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L)), .Names = c("id", 
"race", "group"), class = "data.frame", row.names = c(NA, -25L
))


【问题讨论】:

  • lapply(levels(df$race), filter.race, x=df)

标签: r


【解决方案1】:

最初考虑bytapply 的面向对象包装器)到 racegroup 的子集,并且在每次迭代中 rbindWhite 每个对应的。对于 White 组本身,unique 对数据进行重复数据删除。

df_list <- by(df, df[c("race", "group")], function(sub) {    
    unique(
           rbind(subset(df, race == "White" & group == sub$group[1]),
                 sub)
    )
})

# race: Black
# group: 1
# id  race group
# 1   1 White     1
# 2   2 White     1
# 3   3 White     1
# 4   4 White     1
# 5   5 White     1
# 10 10 Black     1
# 11 11 Black     1
# 12 12 Black     1
# ------------------------------------------------------------ 
# race: Hispanic
# group: 1
# id     race group
# 1   1    White     1
# 2   2    White     1
# 3   3    White     1
# 4   4    White     1
# 5   5    White     1
# 17 17 Hispanic     1
# 18 18 Hispanic     1
# 19 19 Hispanic     1
# 20 20 Hispanic     1
# 21 21 Hispanic     1
# ------------------------------------------------------------ 
# race: White
# group: 1
# id  race group
# 1  1 White     1
# 2  2 White     1
# 3  3 White     1
# 4  4 White     1
# 5  5 White     1
# ------------------------------------------------------------ 
#   race: Black
# group: 2
# id  race group
# 6   6 White     2
# 7   7 White     2
# 8   8 White     2
# 9   9 White     2
# 13 13 Black     2
# 14 14 Black     2
# 15 15 Black     2
# 16 16 Black     2
# ------------------------------------------------------------ 
# race: Hispanic
# group: 2
# id     race group
# 6   6    White     2
# 7   7    White     2
# 8   8    White     2
# 9   9    White     2
# 22 22 Hispanic     2
# 23 23 Hispanic     2
# 24 24 Hispanic     2
# 25 25 Hispanic     2
# ------------------------------------------------------------ 
# race: White
# group: 2
# id  race group
# 6  6 White     2
# 7  7 White     2
# 8  8 White     2
# 9  9 White     2

【讨论】:

    【解决方案2】:

    基本的 R 解决方案可能如下。
    我已将函数名称更改为filter.races,并带有复数“races”。

    filter.races <- function(x){
      races <- unique(x[["race"]])
      races <- as.character(races)
      races <- races[races != "White"]
      res <- lapply(races, function(r){
        s <- subset(x, race %in% c("White", r))
        split(s, s[["group"]])
      })
      unlist(res, recursive = FALSE)
    }
    
    filter.races(df)
    

    【讨论】:

      【解决方案3】:

      这是使用Map 的另一种方法,将"White" 和其他种族的数据分开。

      white_df <- subset(df, df$race == "White")
      rest_df <- subset(df, df$race != "White")
      
      Map(function(x, y) lapply(split(y, y$race), function(p)  rbind(x, p)),
                      split(white_df, white_df$group), split(rest_df, rest_df$group))
      
      
      #`1`
      #$`1`$Black
      #   id  race group
      #1   1 White     1
      #2   2 White     1
      #3   3 White     1
      #4   4 White     1
      #5   5 White     1
      #10 10 Black     1
      #11 11 Black     1
      #12 12 Black     1
      
      #$`1`$Hispanic
      #   id     race group
      #1   1    White     1
      #2   2    White     1
      #3   3    White     1
      #4   4    White     1
      #5   5    White     1
      #17 17 Hispanic     1
      #18 18 Hispanic     1
      #19 19 Hispanic     1
      #20 20 Hispanic     1
      #21 21 Hispanic     1
      #....
      

      【讨论】:

        猜你喜欢
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 2013-11-19
        • 2014-12-11
        相关资源
        最近更新 更多