【问题标题】:Is there a way to specify the reference variable when using `recipes::step_dummy()`?使用 `recipes::step_dummy()` 时有没有办法指定参考变量?
【发布时间】:2021-11-19 14:45:04
【问题描述】:

在使用step_dummy() 创建虚拟变量时,有没有办法指定参考级别?我可以通过设置 one_hot = TRUE 然后删除引用列来做到这一点,但想知道是否可以在 step_dummy() 本身内指定

library(tidymodels)
#> Registered S3 method overwritten by 'tune':
#>   method                   from   
#>   required_pkgs.model_spec parsnip

data(okc)

# level "anything" is the reference level
recipe(Class ~ ., data = okc) %>%
  step_dummy(diet) %>%
  prep() %>%
  bake(new_data = NULL) %>%
  select(starts_with("diet")) %>%
  names()
#> Warning: There are new levels in a factor: NA
#>  [1] "diet_halal"               "diet_kosher"             
#>  [3] "diet_mostly.anything"     "diet_mostly.halal"       
#>  [5] "diet_mostly.kosher"       "diet_mostly.other"       
#>  [7] "diet_mostly.vegan"        "diet_mostly.vegetarian"  
#>  [9] "diet_other"               "diet_strictly.anything"  
#> [11] "diet_strictly.halal"      "diet_strictly.kosher"    
#> [13] "diet_strictly.other"      "diet_strictly.vegan"     
#> [15] "diet_strictly.vegetarian" "diet_vegan"              
#> [17] "diet_vegetarian"

# all 18 diet levels included
recipe(Class ~ ., data = okc) %>%
  step_dummy(diet, one_hot = TRUE) %>%
  prep() %>%
  bake(new_data = NULL) %>%
  select(starts_with("diet")) %>%
  names()
#> Warning: There are new levels in a factor: NA
#>  [1] "diet_anything"            "diet_halal"              
#>  [3] "diet_kosher"              "diet_mostly.anything"    
#>  [5] "diet_mostly.halal"        "diet_mostly.kosher"      
#>  [7] "diet_mostly.other"        "diet_mostly.vegan"       
#>  [9] "diet_mostly.vegetarian"   "diet_other"              
#> [11] "diet_strictly.anything"   "diet_strictly.halal"     
#> [13] "diet_strictly.kosher"     "diet_strictly.other"     
#> [15] "diet_strictly.vegan"      "diet_strictly.vegetarian"
#> [17] "diet_vegan"               "diet_vegetarian"

# force diet_vegan to be reference level
recipe(Class ~ ., data = okc) %>%
  step_dummy(diet, one_hot = TRUE) %>%
  step_select(-diet_vegan) %>%
  prep() %>%
  bake(new_data = NULL) %>%
  select(starts_with("diet")) %>%
  names()
#> Warning: There are new levels in a factor: NA
#>  [1] "diet_anything"            "diet_halal"              
#>  [3] "diet_kosher"              "diet_mostly.anything"    
#>  [5] "diet_mostly.halal"        "diet_mostly.kosher"      
#>  [7] "diet_mostly.other"        "diet_mostly.vegan"       
#>  [9] "diet_mostly.vegetarian"   "diet_other"              
#> [11] "diet_strictly.anything"   "diet_strictly.halal"     
#> [13] "diet_strictly.kosher"     "diet_strictly.other"     
#> [15] "diet_strictly.vegan"      "diet_strictly.vegetarian"
#> [17] "diet_vegetarian"

reprex package (v2.0.1) 于 2021 年 11 月 19 日创建

【问题讨论】:

    标签: r tidymodels r-recipes


    【解决方案1】:

    来自step_dummy() 文档

    默认情况下,排除的虚拟变量(即参考单元格)将对应于被转换的无序因子的第一级。

    我们可以使用step_relevel() 通过设置ref_level 参数来创建新的参考水平。

    library(tidymodels)
    
    data(okc)
    
    recipe(Class ~ ., data = okc) %>%
      step_relevel(diet, ref_level = "vegan") %>%
      step_dummy(diet) %>%
      prep() %>%
      bake(new_data = NULL) %>%
      select(starts_with("diet")) %>%
      names()
    #> Warning: There are new levels in a factor: NA
    #>  [1] "diet_anything"            "diet_halal"              
    #>  [3] "diet_kosher"              "diet_mostly.anything"    
    #>  [5] "diet_mostly.halal"        "diet_mostly.kosher"      
    #>  [7] "diet_mostly.other"        "diet_mostly.vegan"       
    #>  [9] "diet_mostly.vegetarian"   "diet_other"              
    #> [11] "diet_strictly.anything"   "diet_strictly.halal"     
    #> [13] "diet_strictly.kosher"     "diet_strictly.other"     
    #> [15] "diet_strictly.vegan"      "diet_strictly.vegetarian"
    #> [17] "diet_vegetarian"
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2021-06-10
      • 1970-01-01
      • 1970-01-01
      • 2021-12-27
      • 1970-01-01
      相关资源
      最近更新 更多