【问题标题】:Multiple stacked bar chart with ggplot带有ggplot的多个堆叠条形图
【发布时间】:2020-10-06 05:08:46
【问题描述】:

我有一个包含四个变量的数据集,用于衡量受访者对不同主题的看法。我想将它们绘制成一个堆积条形图,以便您可以比较不同主题之间的值。

这是数据集的第一行:

lebanon <- structure(list(climate_change = c(
  "Not a very serious problem",
  "Not a very serious problem", NA, NA, "A very serious problem",
  "A somewhat serious problem"
), air_quality = c(
  "A somewhat serious problem",
  "Not a very serious problem", NA, NA, "A very serious problem",
  "A very serious problem"
), water_polution = c(
  "A somewhat serious problem",
  "Not a very serious problem", NA, NA, "A very serious problem",
  "Not at all a serious problem"
), trash = c(
  "A very serious problem",
  "Not a very serious problem", NA, NA, "A very serious problem",
  "A somewhat serious problem"
)), row.names = c(NA, -6L), class = "data.frame")

我确实尝试过基于this site 的以下代码:

lebanon %>%
  filter(!is.na(climate_change), !is.na(air_quality), !is.na(water_polution), !is.na(trash)) %>%
  gather(variable, value, climate_change:trash) %>%
  ggplot(aes(x = variable, y = value, fill = value)) +
  geom_bar(stat = "identity") +
  coord_flip()

得到这张图:

此图存在三个问题。

1.) 条形图的长度不同。

2.) 我不明白为什么在 x 轴碰到 y 轴的位置写了一些东西。如何删除它?

3.) 我想对这些值进行排序,以便它们有意义,所以我先对它们进行排序:

dataset$climate_change <- factor(dataset$climate_change, levels = c("Not at all a serious problem",
                                                                    "Not a very serious problem",
                                                                    "A somewhat serious problem",
                                                                    "A very serious problem"))

dataset$air_quality <- factor(dataset$air_quality, levels = c("Not at all a serious problem",
                                                                    "Not a very serious problem",
                                                                    "A somewhat serious problem",
                                                                    "A very serious problem"))

dataset$water_polution <- factor(dataset$water_polution, levels = c("Not at all a serious problem",
                                                                    "Not a very serious problem",
                                                                    "A somewhat serious problem",
                                                                    "A very serious problem"))

然而这些值仍然是无序的。我究竟做错了什么?或者有没有更有效的方法来制作多重堆叠条形图?

【问题讨论】:

    标签: r ggplot2 tidyverse


    【解决方案1】:

    cour 代码的主要问题是您在y 上映射了value,即因子变量。此外,您可以简单地使用drop_na 而不是过滤器,并且只需使用收集后的值级别,而不是为每个变量重复它​​。 (; 试试这个:

    顺便说一句:请将您的数据放入带有dput() 的帖子中,例如dput(head(lebanon))。请参阅我对您帖子的编辑。与回答问题相比,清理和正确获取数据需要更多时间。 (;

    ** 编辑 ** 为了按想要的顺序排列条形,我使用了forcats 包。首先我add_count 认为这个问题是“一个非常严重的问题”的受访者人数。然后我 fct_reorder variable 相应地,即 -n 让它下降。为了颠倒value 的顺序,我使用了fct_rev

    lebanon <- structure(list(climate_change = c(
      "Not a very serious problem",
      "Not a very serious problem", NA, NA, "A very serious problem",
      "A somewhat serious problem"
    ), air_quality = c(
      "A somewhat serious problem",
      "Not a very serious problem", NA, NA, "A very serious problem",
      "A very serious problem"
    ), water_polution = c(
      "A somewhat serious problem",
      "Not a very serious problem", NA, NA, "A very serious problem",
      "Not at all a serious problem"
    ), trash = c(
      "A very serious problem",
      "Not a very serious problem", NA, NA, "A very serious problem",
      "A somewhat serious problem"
    )), row.names = c(NA, -6L), class = "data.frame")
    
    library(tidyverse)
    lebanon %>%
      drop_na() %>% 
      gather(variable, value, climate_change:trash) %>%
      add_count(variable, value == "A very serious problem") %>% 
      mutate(value = factor(value, levels = c("Not at all a serious problem",
                                              "Not a very serious problem",
                                              "A somewhat serious problem",
                                              "A very serious problem"))) %>% 
      ggplot(aes(x = forcats::fct_reorder(variable, -n), fill = forcats::fct_rev(value))) +
      geom_bar() +
      coord_flip()
    

    【讨论】:

    • 太棒了!两个问题:我如何改变变量的顺序,让最多的受访者说这是一个非常严重的问题,然后按降序排列?如何将带有 dput() 的数据从 R 放到这里?
    • 哦,drop_na() 实际上会丢弃所有在另一个变量中至少有一个 NA 的受访者吗?数据集包括几个我没有显示的列。我使用 filter(!is.na()) 的意图是指定我不想仅在特定变量中使用 NA。受访者可能已经回答了这些变量,但他们没有给出每个变量的答案。
    • 嗨@Nicosc。第一的。只需将dput(...) 的输出复制并粘贴到您的帖子中。第二。 drop_na 将删除所有至少有一个 NA 的行。如果您只想在数据的特定列中删除带有 NA 的行,那么您必须坚持使用过滤器。关于您的第三个问题,我将再看一下数据。
    • 我刚刚进行了编辑。现在,值的顺序颠倒了,大多数受访者认为这是一个严重问题的变量排在首位。
    • 谢谢 :) 如果我在 R 中编写 dput(head(lebanon)) 并运行它,它只会打印很多似乎没有意义的东西。也许我没有得到它。 ://
    猜你喜欢
    • 2018-09-22
    • 2021-07-31
    • 2013-03-03
    • 2018-04-01
    • 1970-01-01
    • 1970-01-01
    • 2022-11-23
    • 2021-12-08
    • 1970-01-01
    相关资源
    最近更新 更多