【问题标题】:Combining Factor Levels from a Dataframe in R在 R 中组合来自数据框的因子级别
【发布时间】:2018-03-23 22:44:51
【问题描述】:

我有 factor 类型的变量,具有三个级别:Fatal injuryNon-fatal injuryP.D. only

     head(OttawaCollisions$Collision_Classification)
[1] P.D. only        Non-fatal injury P.D. only        P.D. only        P.D. only        P.D. only       
Levels: Fatal injury Non-fatal injury P.D. only

如何将“致命伤害”和“非致命伤害”合并为一个级别,以便将死亡人数添加到伤害中?

更好的是,我什至可以以某种方式消除死亡人数吗?在这种情况下,我需要从数据帧中删除每个致命的实例,而不仅仅是编码 NA 或其他东西。

【问题讨论】:

    标签: r r-factor


    【解决方案1】:

    数据:

    x <- factor( rep( c('P.D. only', 'Non-fatal injury' , 'fatal injury'), 2) )
    x
    # [1] P.D. only        Non-fatal injury fatal injury     P.D. only       
    # [5] Non-fatal injury fatal injury    
    # Levels: fatal injury Non-fatal injury P.D. only
    

    代码:您可以使用labels 参数重命名关卡。忽略重复级别的警告。这里Non-fatal injuryfatal injuryFatalities 组合在一起。最后,使用droplevels() 函数删除重复的关卡。

    x <- factor( x = x, 
                 levels = c('P.D. only', 'Non-fatal injury' , 'fatal injury'),
                 labels = c('P.D. only', 'Fatalities', 'Fatalities'))
    # [1] P.D. only  Fatalities Fatalities P.D. only  Fatalities Fatalities
    # Levels: P.D. only Fatalities Fatalities
    
    droplevels(x)
    # [1] P.D. only  Fatalities Fatalities P.D. only  Fatalities Fatalities
    # Levels: P.D. only Fatalities
    

    编辑:根据您的数据框名称组合代码

    OttawaCollisions$CollisionClass <- factor( x = OttawaCollisions$CollisionClass, 
                                               levels = c('P.D. only', 'Non-fatal injury' , 'fatal injury'),
                                               labels = c('P.D. only', 'Fatalities', 'Fatalities'))
    OttawaCollisions$CollisionClass <- droplevels(OttawaCollisions$CollisionClass)
    

    EDIT2: data.table 解决方案。

    library('data.table')
    setDT(OttawaCollisions)
    OttawaCollisions[ i = CollisionClass %in% c( "fatal injury", "Non-fatal injury"), 
                      j = CollisionClass := "Fatalities"]
    OttawaCollisions[, CollisionClass := droplevels(CollisionClass) ]
    

    EDIT3: 另一个基础 R 解决方案。我更喜欢这个基本的 R 解决方案,而不是第一个解决方案(在 factor() 中使用 labels),因为当您在数据中有更多级别时,它会让生活更轻松。

    OttawaCollisions$CollisionClass <- as.character(OttawaCollisions$CollisionClass)
    OttawaCollisions$CollisionClass <- factor( with(OttawaCollisions, 
                                                    replace( CollisionClass, 
                                                             CollisionClass %in% c( "fatal injury", "Non-fatal injury"),
                                                             "Fatalities") ) )
    

    【讨论】:

    • 谢谢。我会暂时 bu,t x = x 将是 OttawaCollisions$CollisionClass = OttawaCollisions$CollisionClass?
    【解决方案2】:

    您也可以直接重新分配级别:

    > test_df <- tibble(x=as.factor(c('Fatal','Non-fatal','PD','Fatal','Non-fatal','PD')), y=1:6)
    > test_df
    # A tibble: 6 x 2
      x             y
      <fct>     <int>
    1 Fatal         1
    2 Non-fatal     2
    3 PD            3
    4 Fatal         4
    5 Non-fatal     5
    6 PD            6
    > levels(test_df$x)
    [1] "Fatal"     "Non-fatal" "PD"       
    

    既然您知道顺序,请替换您想要组合的关卡名称:

    > levels(test_df$x) <- c("Fatal","Other","Other")
    > test_df
    # A tibble: 6 x 2
      x         y
      <fct> <int>
    1 Fatal     1
    2 Other     2
    3 Other     3
    4 Fatal     4
    5 Other     5
    6 Other     6
    

    然后你可以做额外的处理,例如:

    > library(dplyr)
    > test_df %>% group_by(x) %>% summarize(n)
    # A tibble: 2 x 2
      x         n
      <fct> <dbl>
    1 Fatal  45.0
    2 Other  45.0
    

    【讨论】:

      猜你喜欢
      • 2013-11-26
      • 1970-01-01
      • 1970-01-01
      • 2014-02-11
      • 1970-01-01
      • 1970-01-01
      • 2022-01-20
      • 2023-03-14
      • 1970-01-01
      相关资源
      最近更新 更多