ggplot2中堆叠条形图的重新排序因子答案

【问题标题】：Reordering factors for a stacked barplot in ggplot2ggplot2中堆叠条形图的重新排序因子
【发布时间】：2017-05-19 14:44:04
【问题描述】：

这里是生物学家和 ggplot2 初学者。我有一个相对较大的 DNA 序列数据集（数百万个短 DNA 片段），我首先需要过滤每个序列的质量。我想说明使用 ggplot2 堆积条形图过滤掉了我的多少读数。

我发现 ggplot 喜欢长格式的数据，并成功地使用 reshape2 中的 melt 函数重新格式化了它

这是当前数据子集的样子：

library sample  filter  value
LIB0    0011a   F1  1272707
LIB0    0018a   F1  1505554
LIB0    0048a   F1  1394718
LIB0    0095a   F1  2239035
LIB0    0011a   F2  250000
LIB0    0018a   F2  10000
LIB0    0048a   F2  10000
LIB0    0095a   F2  10000
LIB0    0011a   P   2118559
LIB0    0018a   P   2490068
LIB0    0048a   P   2371131
LIB0    0095a   P   3446715
LIB1    0007b   F1  19377
LIB1    0010b   F1  79115
LIB1    0011b   F1  2680
LIB1    0007b   F2  10000
LIB1    0010b   F2  10000
LIB1    0011b   F2  10000
LIB1    0007b   P   290891
LIB1    0010b   P   1255638
LIB1    0011b   P   4538

library 和 sample 是我的 ID 变量（同一个 sample 可以在多个 library 中）。 'F1'和'F2'表示这一步过滤掉了这么多reads，'P'表示过滤后剩余的sequence reads个数。

我已经想出了如何制作一个基本的堆叠条形图，但现在我遇到了麻烦，因为我无法弄清楚如何正确地重新排列 x 轴上的因子，因此条形图在图中按降序排序基于F1、F2 和 P 的总和。现在的方式我认为它们是根据样本名称在库中按字母顺序排序的

testdata <- read.csv('testdata.csv', header = T, sep = '\t')

ggplot(testdata, aes(x=sample, y=value, fill=filter)) + 
  geom_bar(stat='identity') +
  facet_wrap(~library, scales = 'free')

经过一番谷歌搜索后，我发现了聚合函数，它为我提供了每个库的每个样本的总数：

aggregate(value ~ library+sample, testdata, sum)

  library sample   value
1    LIB1  0007b  320268
2    LIB1  0010b 1344753
3    LIB0  0011a 3641266
4    LIB1  0011b   17218
5    LIB0  0018a 4005622
6    LIB0  0048a 3775849
7    LIB0  0095a 5695750

虽然这确实给了我总数，但我现在不知道如何使用它来重新排序因素，特别是因为我需要考虑两个因素（库和样本）。

所以我想我的问题可以归结为：如何根据每个库的 F1、F2 和 P 的总和对我的图中的样本进行排序？

非常感谢您能给我的任何指点！

【问题讨论】：

this from SO对你有帮助吗？

标签： r ggplot2

【解决方案1】：

你快到了。您需要根据汇总数据更改testdata$sample 的因子水平（我想lib1 和lib0 中都没有出现样本名称）：

df <- aggregate(value ~ library+sample, testdata, sum)

testdata$sample <- factor(testdata$sample, levels = df$sample[order(-df$value)])

ggplot(testdata, aes(x=sample, y=value, fill=filter)) + 
    geom_bar(stat='identity') +
    facet_wrap(~library, scales = 'free')

【讨论】：

完美，做到了！感谢您的帮助，非常感谢！