【问题标题】:Summarise and group_by not working with factor variablesSummarize 和 group_by 不使用因子变量
【发布时间】:2021-11-23 14:12:42
【问题描述】:

我目前使用的是 tidyverse 包版本 1.3.1,当我运行以下代码时:

data <- data.frame(gender = c(1,2,1,2,2,2,2,1,2,1), age = c(18,20,21,24,25,24,24,25,22,21))

data <- data%>%
  mutate(gender = factor(gender, levels = c("male", "female")))

data%>%
  group_by(gender)%>%
  summarise(mean = mean(age))

我得到了这些结果

   # A tibble: 1 × 2
  gender  mean
  <fct>  <dbl>
1 NA      22.4

【问题讨论】:

    标签: r dplyr group-by tidyverse summarize


    【解决方案1】:

    是的,您应该更改 labels 而不是 levels

    library(dplyr)
    
    data%>%
      mutate(gender = factor(gender, labels = c("male", "female"))) %>%
      group_by(gender)%>%
      summarise(mean = mean(age))
    
    #  gender  mean
    #  <fct>  <dbl>
    #1 male    21.2
    #2 female  23.2
    

    【讨论】:

      【解决方案2】:

      我们不需要转换为factor 进行重新编码。可以通过使用“性别”(数字变量)作为替换值的索引来直接完成

      library(dplyr)
      data %>%
          group_by(gender = c("male", "female")[gender]) %>%
          summarise(mean = mean(age, na.rm = TRUE))
      

      -输出

      # A tibble: 2 × 2
        gender  mean
        <chr>  <dbl>
      1 female  23.2
      2 male    21.2
      

      或者使用fct_recode

      library(forcats)
      data %>%
         group_by(gender = fct_recode(as.character(gender), male = "1",
               female = "2")) %>% 
         summarise(mean = mean(age, na.rm = TRUE))
      # A tibble: 2 × 2
        gender  mean
        <fct>  <dbl>
      1 male    21.2
      2 female  23.2
      

      【讨论】:

        猜你喜欢
        • 2016-03-11
        • 1970-01-01
        • 2019-12-20
        • 2020-03-19
        • 1970-01-01
        • 1970-01-01
        • 2022-06-10
        • 2021-11-01
        • 1970-01-01
        相关资源
        最近更新 更多