ggplot2 R：具有多个变量的百分比堆积条形图答案

【问题标题】：ggplot2 R : Percent stacked barchart with multiple variablesggplot2 R：具有多个变量的百分比堆积条形图
【发布时间】：2021-06-09 17:05:34
【问题描述】：

R 版本 4.0.5 (2021-03-31) 平台：x86_64-w64-mingw32/x64（64位）运行于：Windows 10 x64（内部版本 19042）

我想创建一个百分比堆积条形图，其中包括 2 个组（区域、国际）和 4 个不同数值变量（地面低强度、地面高强度、站立低强度、站立高强度）的平均值。后面的变量以秒为单位表示每个时间段的持续时间。

我的数据是： dataset

下图是我想做的一个例子： Time-motion analysis description relative to total fight time, considering modalities and positions of actions Coswig, V. S., Gentil, P., Bueno, J. C., Follmer, B., Marques, V. A., & Del Vecchio, F. B. (2018). Physical fitness predicts technical-tactical and time-motion profile in simulated Judo and Brazilian Jiu-Jitsu matches. PeerJ, 6, e4851.

我阅读了很多指南并观看了很多 YT 教程，但其中大多数都使用 2 个分类变量和 1 个数值变量，因此，它不适用于我的情况。

任何帮助或指导将不胜感激。

提前谢谢你。

【问题讨论】：

首先使用dput()给我们数据。有必要重组您的数据以创建绘图。数据图片没用。
使用可重现的示例会更容易提供帮助。不鼓励使用图片。

标签： r ggplot2 data-visualization bar-chart

【解决方案1】：

如果你提供一个可重复的例子并展示你做了什么以及哪里出了问题，你会在这里找到很多朋友。

数据

ds <- tribble(
    ~GROUP, ~GLI, ~GHI,~SLI, ~SHI,~GT,~ST,~EFFORT, ~PAUSE, ~HI, ~LI
    ,"REG", 158, 48, 26, 4, 205, 30, 235, 10, 51, 184
    ,"INT", 217, 62, 20, 1, 279, 21, 300, 11, 63, 237
)

{ggplot} 最适合长数据。这里 tidyr 是你的朋友，pivot_longer()

ds <- ds %>% 
 pivot_longer(
         cols=c(GLI:SHI)          # wich cols to take
       , names_to = "intensity"   # where to put the names aka intensitites
       , values_to = "duration"   # where to put the values you want to plot
    ) %>% 
#-------------------- calculate the shares of durations per group
    group_by(GROUP) %>% 
    mutate(share = duration / sum(duration)
)

这会给你一个像这样的小标题：

# A tibble: 8 x 10
# Groups:   GROUP [2]
  GROUP    GT    ST EFFORT PAUSE    HI    LI intensity duration   share
  <chr> <dbl> <dbl>  <dbl> <dbl> <dbl> <dbl> <chr>        <dbl>   <dbl>
1 REG     205    30    235    10    51   184 GLI            158 0.669  
2 REG     205    30    235    10    51   184 GHI             48 0.203  
3 REG     205    30    235    10    51   184 SLI             26 0.110  
4 REG     205    30    235    10    51   184 SHI              4 0.0169 
5 INT     279    21    300    11    63   237 GLI            217 0.723  
6 INT     279    21    300    11    63   237 GHI             62 0.207  
7 INT     279    21    300    11    63   237 SLI             20 0.0667 
8 INT     279    21    300    11    63   237 SHI              1 0.00333

最后一列为您提供类别和持续时间百分比，分组是使用 GROUP 变量完成的。然后就可以用ggplot打印出来了。

ds %>%
    ggplot() + 
    geom_col(aes(x = GROUP, y = share, fill = intensity), position = position_stack())  + 
    scale_y_continuous(labels=scales::percent)

然后您可以“美化”情节，选择所需的主题、颜色、图例等。希望这能让你开始！

【讨论】：

非常感谢@Ray，太棒了！另外，我为图像而不是代码感到抱歉，我的代码真的很乱，因为我花了很多时间试验它，我在这个过程中迷路了。我认为最好举一个例子来说明我所追求的，而不是我令人难以置信的混乱代码！我试图在我的笔记本电脑和最后一块代码上复制确切的代码，即在 ggplot 中打印它，我收到以下消息：FUN（X[[i]]，...）中的错误：对象'强度'未找到。
检查拼写，双笔划"intensity". This happens to me regularly. I get always confused for some of the tidyverse functions when to use the literal name or make it a string. Without the double strokes it would expect it as an R-object. Next make sure that intesity`在aes()映射内。
早上喝了杯咖啡后，我发现了错误！最后一块代码应该：ds %>% ggplot() + geom_col(aes(x = GROUP, y = share, fill = intensity), position = position_stack()) + scale_y_continuous(labels=scales::percent) 应该以s 开头，而不是ds！再次感谢@Ray，这太棒了！当我再次需要帮助时，我会努力改进我的问题！
优秀。很高兴看到你找到了罪魁祸首。我和你在一起……咖啡解决了很多问题！祝您的研究和进一步的旅程一切顺利！