【问题标题】:How do you assign unique factor to multiple values in R?如何将唯一因子分配给 R 中的多个值?
【发布时间】:2017-11-20 11:04:30
【问题描述】:

假设我有一个从 0 到 20 的数字数据集 我想创建一个 3 个不同的年龄段 0~9岁、10~15岁、16~20岁

如何将 3 个因子分配给 0 到 20 的一组数字 对应它们的特定值?

例如,0 到 9 之间的值将被分配“0~9 岁”因子 并且 10 到 15 将被分配“10~15 岁”因子,依此类推

我如何在 R 中做到这一点?

【问题讨论】:

  • 到目前为止你做了什么,为什么要添加python标签?

标签: r


【解决方案1】:

case_when 函数可以解决问题。请尝试以下操作:

library(tidyverse)

df <- tibble(age = 1:20)

df %>% 
  mutate(age_categories = case_when(age <= 9 ~ "0~9 years old",
                                    age <= 15 & age > 9 ~ "10~15 years old",
                                    age <= 20 & age > 15 ~ "16~20 years old",
                                    TRUE ~ "Other"))

返回:

# A tibble: 20 x 2
     age  age_categories
   <int>           <chr>
 1     1   0~9 years old
 2     2   0~9 years old
 3     3   0~9 years old
 4     4   0~9 years old
 5     5   0~9 years old
 6     6   0~9 years old
 7     7   0~9 years old
 8     8   0~9 years old
 9     9   0~9 years old
10    10 10~15 years old
11    11 10~15 years old
12    12 10~15 years old
13    13 10~15 years old
14    14 10~15 years old
15    15 10~15 years old
16    16 16~20 years old
17    17 16~20 years old
18    18 16~20 years old
19    19 16~20 years old
20    20 16~20 years old

或者,您可以执行以下操作:

df$age_categories <- factor(df$age)

levels(df$age_categories) <- list(
  "0~9 years old" = 1:9,
  "10~15 years old" = 10:15,
  "16~20 years old" = 16:20
)

【讨论】:

    【解决方案2】:

    使用base::cut (R)/pandas.cut (Python)?

    df <- data.frame(age = 0:20)
    labels = sprintf("from %s yrs old", c("0~9","10~15","16~20")
    df$groups <- cut(
      df$age, 
      breaks=c(0,9,15,20), 
      include.lowest = T, 
      labels = labels)
    )
    df$groups
    # [1] from 0~9 yrs old   from 0~9 yrs old   from 0~9 yrs old   from 0~9 yrs old   from 0~9 yrs old   from 0~9 yrs old  
    # [7] from 0~9 yrs old   from 0~9 yrs old   from 0~9 yrs old   from 0~9 yrs old   from 10~15 yrs old from 10~15 yrs old
    # [13] from 10~15 yrs old from 10~15 yrs old from 10~15 yrs old from 10~15 yrs old from 16~20 yrs old from 16~20 yrs old
    # [19] from 16~20 yrs old from 16~20 yrs old from 16~20 yrs old
    # Levels: from 0~9 yrs old from 10~15 yrs old from 16~20 yrs old
    

    import pandas as pd
    df = pd.DataFrame({'age':range(20)})
    labels = ['from %s yrs old' % x for x in ['0~9','10~15','16~20']]
    df.groups = pd.cut(
      df.age,
      bins = [0,9,15,20],
      include_lowest=True, labels = labels)
    df.groups
    #0       from 0~9 yrs old
    #1       from 0~9 yrs old
    #2       from 0~9 yrs old
    #3       from 0~9 yrs old
    #4       from 0~9 yrs old
    #5       from 0~9 yrs old
    #6       from 0~9 yrs old
    #7       from 0~9 yrs old
    #8       from 0~9 yrs old
    #9       from 0~9 yrs old
    #10    from 10~15 yrs old
    #11    from 10~15 yrs old
    #12    from 10~15 yrs old
    #13    from 10~15 yrs old
    #14    from 10~15 yrs old
    #15    from 10~15 yrs old
    #16    from 16~20 yrs old
    #17    from 16~20 yrs old
    #18    from 16~20 yrs old
    #19    from 16~20 yrs old
    #Name: age, dtype: category
    #Categories (3, object): [from 0~9 yrs old < from 10~15 yrs old < from 16~20 yrs old]
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 2022-10-18
      • 1970-01-01
      • 2018-05-14
      • 2019-10-08
      • 2015-09-08
      • 2014-10-09
      • 2018-10-18
      相关资源
      最近更新 更多