【问题标题】:Bar charts connected by lines / How to connect two graphs arranged with grid.arrange in R / ggplot2由线条连接的条形图/如何连接在 R / ggplot2 中使用 grid.arrange 排列的两个图形
【发布时间】:2019-03-22 16:44:48
【问题描述】:

在 Facebook 研究中,我发现了这些漂亮的条形图,这些条形图由线条连接以指示排名变化:

https://research.fb.com/do-jobs-run-in-families/

我想使用 ggplot2 创建它们。条形图部分很简单:

library(ggplot2)
library(ggpubr)
state1 <- data.frame(state=c(rep("ALABAMA",3), rep("CALIFORNIA",3)), 
                 value=c(61,94,27,10,30,77), 
                 type=rep(c("state","local","fed"),2),
                 cumSum=c(rep(182,3), rep(117,3)))
state2 <- data.frame(state=c(rep("ALABAMA",3), rep("CALIFORNIA",3)), 
                 value=c(10,30,7,61,94,27), 
                 type=rep(c("state","local","fed"),2),
                 cumSum=c(rep(117,3), rep(182,3)))
fill <- c("#40b8d0", "#b2d183", "#F9756D")

p1 <- ggplot(data = state1) +
  geom_bar(aes(x = reorder(state, value), y = value, fill = type), stat="identity") +
  theme_bw() + 
  scale_fill_manual(values=fill) + 
  labs(x="", y="Total budget in 1M$") +
  theme(legend.position="none", 
        legend.direction="horizontal", 
        legend.title = element_blank(),
        axis.line = element_line(size=1, colour = "black"),
        panel.grid.major = element_blank(), 
        panel.grid.minor = element_blank(),
        panel.border = element_blank(), panel.background = element_blank()) +
  coord_flip() 

p2 <- ggplot(data = state2) +
  geom_bar(aes(x = reorder(state, value), y = value, fill = type), stat="identity") +
  theme_bw() + 
  scale_fill_manual(values=fill) + labs(x="", y="Total budget in 1M$") +
  theme(legend.position="none", 
        legend.direction="horizontal", 
        legend.title = element_blank(),
        axis.line = element_line(size=1, colour = "black"),
        panel.grid.major = element_blank(), 
        panel.grid.minor = element_blank(),
        panel.border = element_blank(), 
        panel.background = element_blank()) +
  scale_x_discrete(position = "top") + 
  scale_y_reverse() +
  coord_flip()

p3 <- ggarrange(p1, p2, common.legend = TRUE, legend = "bottom")

但我无法想出解决线路部分的方法。添加行时,例如到左边

p3 + geom_segment(aes(x = rep(1:2, each=3), xend = rep(1:10, each=3), 
                   y = cumSum[order(cumSum)], yend=cumSum[order(cumSum)]+10), size = 1.2)

问题是线条将无法越过右侧。 它看起来像这样:

基本上,我想将左侧的“加利福尼亚”栏与右侧的加利福尼亚栏连接起来。

我认为,要做到这一点,我必须以某种方式访问​​图表的上级。我查看了视口,并能够用由 geom_segment 制成的图表覆盖两个条形图,但后来我无法找出线条的正确布局:

subplot <- ggplot(data = state1) + 
  geom_segment(aes(x = rep(1:2, each=3), xend = rep(1:2, each=3), 
                   y = cumSum[order(cumSum)], yend =cumSum[order(cumSum)]+10), 
               size = 1.2)

vp <- viewport(width = 1, height = 1, x = 1, y = unit(0.7, "lines"), 
               just ="right", "bottom"))
print(p3)
print(subplot, vp = vp)

非常感谢帮助或指点。

【问题讨论】:

  • alluvial 可能是绘制线条的有用软件包(剩下的挑战是弄清楚如何在冲积地块上绘制条形图)
  • 很酷的问题!还可以考虑上传您的情节以吸引更多关注。
  • 如果您可以计算条形图的相对 x/y 中心位置,您可以使用类似于 grid.lines(x = unit(c(.475, .525), "npc"), y = unit(c(.7, .4), "npc")) 的东西,但这似乎非常骇人......
  • 您能否详细说明您希望这些行如何连接? cumSum 未在您的代码中定义。

标签: r ggplot2 bar-chart data-visualization r-grid


【解决方案1】:

这是一个非常有趣的问题。我使用patchwork 库对其进行了近似计算,它可以让您将ggplots 添加在一起,并为您提供一种控制它们的布局的简单方法——我更喜欢它来做任何基于grid.arrange 的事情,并且在某些情况下它工作得更好比cowplot

我扩展数据集只是为了在两个数据框中获取更多值。

library(tidyverse)
library(patchwork)

set.seed(1017)

state1 <- data_frame(
  state = rep(state.name[1:5], each = 3),
  value = floor(runif(15, 1, 100)),
  type = rep(c("state", "local", "fed"), times = 5)
)

state2 <- data_frame(
  state = rep(state.name[1:5], each = 3),
  value = floor(runif(15, 1, 100)),
  type = rep(c("state", "local", "fed"), times = 5)
)

然后我制作了一个数据框,根据原始数据框中的其他值(state1 或 state2)为每个状态分配排名。

ranks <- bind_rows(
  state1 %>% mutate(position = 1),
  state2 %>% mutate(position = 2)
)  %>%
  group_by(position, state) %>%
  summarise(state_total = sum(value)) %>%
  mutate(rank = dense_rank(state_total)) %>%
  ungroup()

我制作了一个快速主题以保持最小化并删除轴标记:

theme_min <- function(...) theme_minimal(...) +
  theme(panel.grid = element_blank(), legend.position = "none", axis.title = element_blank())

凹凸图(中间)基于ranks 数据框,没有标签。使用因子而不是数字变量来表示位置和排名让我可以更好地控制间距,并让排名与离散的 1 到 5 值对齐,以匹配条形图中的州名称。

p_ranks <- ggplot(ranks, aes(x = as.factor(position), y = as.factor(rank), group = state)) +
  geom_path() +
  scale_x_discrete(breaks = NULL, expand = expand_scale(add = 0.1)) +
  scale_y_discrete(breaks = NULL) +
  theme_min()
p_ranks

对于左侧条形图,我按值对状态进行排序,并将值变为负值以指向左侧,然后为其赋予相同的最小主题:

p_left <- state1 %>%
  mutate(state = as.factor(state) %>% fct_reorder(value, sum)) %>%
  arrange(state) %>%
  mutate(value = value * -1) %>%
  ggplot(aes(x = state, y = value, fill = type)) +
    geom_col(position = "stack") +
    coord_flip() +
    scale_y_continuous(breaks = NULL) +
    theme_min() +
    scale_fill_brewer()
p_left

右侧的条形图几乎相同,除了值保持正数并且我将 x 轴移到顶部(翻转坐标时变为右侧):

p_right <- state2 %>%
  mutate(state = as.factor(state) %>% fct_reorder(value, sum)) %>%
  arrange(state) %>%
  ggplot(aes(x = state, y = value, fill = type)) +
    geom_col(position = "stack") +
    coord_flip() +
    scale_x_discrete(position = "top") +
    scale_y_continuous(breaks = NULL) +
    theme_min() +
    scale_fill_brewer()

然后因为我已经加载了patchwork,我可以将这些图添加在一起并指定布局。

p_left + p_ranks + p_right +
  plot_layout(nrow = 1)

您可能需要更多地调整间距和边距,例如使用带有凹凸图的 expand_scale 调用。我还没有尝试过沿 y 轴使用轴标记(即翻转后的底部),但我觉得如果你不在行列中添加虚拟轴,事情可能会变得不正常。还有很多事情要做,但这是你提出的一个很酷的可视化项目!

【讨论】:

    【解决方案2】:

    这是一个纯粹的 ggplot2 解决方案,它将底层数据框组合成一个并将所有内容绘制在一个图中:

    数据操作:

    library(dplyr)    
    bar.width <- 0.9
    
    # combine the two data sources
    df <- rbind(state1 %>% mutate(source = "state1"),
                state2 %>% mutate(source = "state2")) %>%
    
      # calculate each state's rank within each data source
      group_by(source, state) %>%
      mutate(state.sum = sum(value)) %>%
      ungroup() %>%
      group_by(source) %>%
      mutate(source.rank = as.integer(factor(state.sum))) %>%
      ungroup() %>%
    
      # calculate the dimensions for each bar
      group_by(source, state) %>%
      arrange(type) %>% 
      mutate(xmin = lag(cumsum(value), default = 0),
             xmax = cumsum(value),
             ymin = source.rank - bar.width / 2,
             ymax = source.rank + bar.width / 2) %>% 
      ungroup() %>%
    
      # shift each data source's coordinates away from point of origin,
      # in order to create space for plotting lines
      mutate(x = ifelse(source == "state1", -max(xmax) / 2, max(xmax) / 2)) %>%
      mutate(xmin = ifelse(source == "state1", x - xmin, x + xmin),
             xmax = ifelse(source == "state1", x - xmax, x + xmax)) %>%
    
      # calculate label position for each data source
      group_by(source) %>%
      mutate(label.x = max(abs(xmax))) %>%
      ungroup() %>%
      mutate(label.x = ifelse(source == "state1", -label.x, label.x),
             hjust = ifelse(source == "state1", 1.1, -0.1))
    

    剧情:

    ggplot(df, 
           aes(x = x, y = source.rank,
               xmin = xmin, xmax = xmax, 
               ymin = ymin, ymax = ymax,
               fill = type)) +
      geom_rect() +
      geom_line(aes(group = state)) +
      geom_text(aes(x = label.x, label = state, hjust = hjust),
                check_overlap = TRUE) +
    
      # allow some space for the labels; this may be changed
      # depending on plot dimensions
      scale_x_continuous(expand = c(0.2, 0)) +
      scale_fill_manual(values = fill) +
    
      theme_void() +
      theme(legend.position = "top")
    

    数据源(与@camille 相同):

    set.seed(1017)
    
    state1 <- data_frame(
      state = rep(state.name[1:5], each = 3),
      value = floor(runif(15, 1, 100)),
      type = rep(c("state", "local", "fed"), times = 5)
    )
    
    state2 <- data_frame(
      state = rep(state.name[1:5], each = 3),
      value = floor(runif(15, 1, 100)),
      type = rep(c("state", "local", "fed"), times = 5)
    )
    

    【讨论】:

      猜你喜欢
      • 2020-03-08
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2011-06-06
      • 2021-01-06
      • 2011-07-21
      • 1970-01-01
      相关资源
      最近更新 更多