【问题标题】:Selecting top n groups with dplyr then plotting other variables使用 dplyr 选择前 n 个组,然后绘制其他变量
【发布时间】:2019-01-18 20:09:10
【问题描述】:

我有一个数据集,我试图通过计算一个类别来仅选择前 n 个类别,然后使用数据集中的其他变量进行绘图——基本上是前 n 个级别的聚合,但需要返回在ggplot 中绘制完整数据。

所以在下面的问题中,我想要两个最常见的examNames,然后按year 的计数对它们进行绘图和facetwrap

ap <- 
      tribble(
        ~year, ~examName,
        2014, "Statistics",
        2015, "Statistics",
        2016, "Statistics",
        2016, "Statistics",
        2016, "Statistics",
        2016, "Statistics",
        2017, "Statistics",
        2017, "Statistics",
        2017, "Statistics",
        2017, "Statistics",
        2017, "Statistics",
        2013, "Macroeconomics",
        2013, "Macroeconomics",
        2014, "Macroeconomics",
        2015, "Macroeconomics",
        2016, "Macroeconomics",
        2016, "Macroeconomics",
        2016, "Macroeconomics",
        2016, "Macroeconomics",
        2016, "Macroeconomics",
        2017, "Macroeconomics",
        2017, "Macroeconomics",
        2017, "Macroeconomics",
        2017, "Macroeconomics",
        2017, "Macroeconomics",
        2017, "Macroeconomics",
        2013, "Calculus",
        2014, "Calculus",
        2015, "Calculus",
        2016, "Calculus",
        2017, "Calculus",
        2017, "Psychology",
        2017, "Psychology",
        2017, "Psychology",
        2017, "Psychology",
        2017, "Psychology",
        2018, "Psychology",
        2018, "Psychology")


ap_top <- ap %>% 
    count(examName, sort = TRUE) %>% 
    head(2) %>% 
    inner_join(ap, by = "examName") %>% 
    select(-n)

ap_top %>% 
    count(examName, year) %>% 
    ggplot(aes(x = year, y = n, group = examName)) +
    geom_line() +
    facet_wrap(~ examName)

我的想法是让我的前 n 个,然后 inner_join 回到原始数据集。然后使用它进行绘图;本质上使用内部连接作为过滤器。

我知道有更好的方法可以做到这一点,我希望有更优雅的解决方案!我全是耳朵!给出了示例数据集(抱歉,它太长了)。

【问题讨论】:

    标签: r ggplot2 dplyr


    【解决方案1】:

    您不需要inner_join() 我只需在单独的语句中确定前两项考试,然后过滤这些考试。

    top_exams <- count(ap, examName) %>% 
      top_n(2, n) %>% pull(examName)
    
    ap %>% 
      filter(examName %in% top_exams) %>% 
      count(year, examName) %>% 
      ggplot(aes(x = year, y = n, group = examName)) +
      geom_line() +
      facet_wrap(~ examName)
    

    【讨论】:

      【解决方案2】:

      另一种可能性:

      ap %>% 
       group_by(examName) %>%
       mutate(temp = n()) %>%
       ungroup() %>%
       mutate(temp = dense_rank(desc(temp))) %>%
       filter(temp %in% c(1,2)) %>%
       select(-temp) %>%
       count(year, examName) %>% 
       ggplot(aes(x = year, y = n, group = examName)) +
       geom_line() +
       facet_wrap(~ examName)
      

      它计算每个“examName”的案例并对计数进行排名。然后,它过滤具有最大和第二大计数的案例。

      【讨论】:

      • 这个解决方案的好处是你可以用dense_rank做一些事情,比如在fct_reorder中使用它来在图中进行排序。
      猜你喜欢
      • 1970-01-01
      • 2021-07-11
      • 2018-07-31
      • 2021-02-12
      • 1970-01-01
      • 2021-05-21
      • 1970-01-01
      • 1970-01-01
      • 2020-06-22
      相关资源
      最近更新 更多