【问题标题】:Remove columns with factors that has less than 5 observations per level删除具有每个级别少于 5 个观测值的因子的列
【发布时间】:2020-08-31 13:44:35
【问题描述】:

我有一个由 100 多列组成的数据集,所有列都是类型因子。例如:

          animal               fruit               vehicle              color 
             cat              orange                   car               blue 
             dog               apple                   bus              green 
             dog               apple                   car              green 
             dog              orange                   bus              green

在我的数据集中,我需要删除所有具有每个级别少于 5 个观察值的因子的列。在此示例中,如果我想删除每个级别的观察量小于或等于1 的所有列,例如bluecat,算法将删除列animalcolor。最优雅的方法是什么?

【问题讨论】:

  • 在示例中,所有列均显示 2 个唯一值

标签: r dataframe data-handling


【解决方案1】:

我们可以使用Filtertable

Filter(function(x) !any(table(x) < 2), df1)
#  fruit vehicle
#1 orange     car
#2  apple     bus
#3  apple     car
#4 orange     bus

数据

df1 <- structure(list(animal = structure(c(1L, 2L, 2L, 2L), .Label = c("cat", 
"dog"), class = "factor"), fruit = structure(c(2L, 1L, 1L, 2L
), .Label = c("apple", "orange"), class = "factor"), vehicle = structure(c(2L, 
1L, 2L, 1L), .Label = c("bus", "car"), class = "factor"), color = structure(c(1L, 
2L, 2L, 2L), .Label = c("blue", "green"), class = "factor")),
row.names = c(NA, 
-4L), class = "data.frame")

【讨论】:

    【解决方案2】:

    我们可以从dplyr 使用select_if

    library(dplyr)
    df1 %>% select_if(~all(table(.) > 1))
    
    #   fruit vehicle
    #1 orange     car
    #2  apple     bus
    #3  apple     car
    #4 orange     bus
    

    【讨论】:

      猜你喜欢
      • 2012-09-21
      • 2016-10-16
      • 1970-01-01
      • 1970-01-01
      • 2014-11-24
      • 1970-01-01
      • 2013-05-05
      • 2013-05-19
      • 1970-01-01
      相关资源
      最近更新 更多