如何绘制具有不同分类变量的大量密度图答案

【问题标题】：How to plot a large number of density plots with different categorical variables如何绘制具有不同分类变量的大量密度图
【发布时间】：2019-09-01 14:00:56
【问题描述】：

我有一个数据集，其中有一个数值变量和许多分类变量。我想制作一个密度图网格，每个都显示不同分类变量的数字变量的分布，填充对应于每个分类变量的子组。例如：

library(tidyverse)
library(nycflights13)

dat <- flights %>%
  select(carrier, origin, distance) %>%
  mutate(origin = origin %>% as.factor,
         carrier = carrier %>% as.factor)

plot_1 <- dat %>%
  ggplot(aes(x = distance, fill = carrier)) +
  geom_density()

plot_1

plot_2 <- dat %>%
  ggplot(aes(x = distance, fill = origin)) +
  geom_density()

plot_2

我想找到一种方法来快速制作这两个情节。现在，我知道如何做到这一点的唯一方法是单独创建每个图，然后使用 grid_arrange 将它们放在一起。然而，我的真实数据集有大约 15 个分类变量，所以这会非常耗时！

有没有更快更简单的方法来做到这一点？我认为最难的部分是每个情节都有自己的传奇，所以我不知道如何绕过那个绊脚石。

【问题讨论】：

请提供示例数据。我认为您应该将“宽度数据”转换为“长数据”，然后使用facet_wrap 进行绘图。
我的帖子有一个可重现的例子吗？

标签： r ggplot2 density-plot

【解决方案1】：

这个解决方案给出了一个列表中的所有图。在这里，我们创建了一个函数来接受您要绘制的变量，然后将 lapply 与您要绘制的所有变量的向量一起使用。

fill_variables <- vars(carrier, origin)

func_plot <- function(fill_variable) {
  dat %>%
  ggplot(aes(x = distance, fill = !!fill_variable)) +
  geom_density()
}

plotlist <- lapply(fill_variables, func_plot)

如果您不知道!! 的含义，我建议您观看this 5 minute video that introduces the key concepts of tidy evaluation。当您想要创建此类包装函数以编程方式执行操作时，这就是您想要使用的。我希望这会有所帮助！

编辑：如果您想输入字符串数组而不是 quosure，您可以将 !!fill_variable 更改为 !!sym(fill_variable)，如下所示：

fill_variables <- c('carrier', 'origin')

func_plot <- function(fill_variable) {
  dat %>%
    ggplot(aes(x = distance, fill = !!sym(fill_variable))) +
    geom_density()
}

plotlist <- lapply(fill_variables, func_plot)

【讨论】：

你是世界上一切美好的东西。非常感谢！ :)
快速跟进。我无法将列名传递到“fill_variables”。现在我正在使用以下代码提取列名： flight_cat % sapply(is.factor) %>% which() flight_cat_names % select(flights_cat) %>% colnames 你知道怎么做吗我可以将该列名向量传递给“vars”吗？

【解决方案2】：

替代解决方案

正如@djc 在 cmets 中所写，I'm having trouble passing the column names into 'fill_variables'. Right now I am extracting column names using the following code...

您可以将分类变量和数值变量分开，例如； cat_vars <- flights[, sapply(flights, is.character)] 用于分类变量，cat_vars <- flights[, sapply(flights, !is.character)] 用于连续变量，然后将这些向量传递给mgiormenti 给出的包装函数

完整代码如下；

library(tidyverse)
library(nycflights13)

cat_vars <- flights[, sapply(flights, is.character)]
cont_vars<- flights[, !sapply(flights, is.character)]
dat <- flights %>%
  select(carrier, origin, distance) %>%
  mutate(origin = origin %>% as.factor,
         carrier = carrier %>% as.factor)

func_plot_cat <- function(cat_vars) {
  dat %>%
    ggplot(aes(x = distance, fill = !!cat_vars)) +
    geom_density()
}

func_plot_cont <- function(cont_vars) {
  dat %>%
    ggplot(aes(x = distance, fill = !!cont_vars)) +
    geom_point()
}

plotlist_cat_vars <- lapply(cat_vars, func_plot_cat)
plotlist_cont_vars<- lapply(cont_vars, func_plot_cont)
print(plotlist_cat_vars)
print(plotlist_cont_vars)

【讨论】：