【问题标题】:I want to display the top 10 upregulated and downregulated genes in a volcano plot我想在火山图中显示前 10 个上调和下调的基因
【发布时间】:2020-08-28 03:44:13
【问题描述】:

我的数据框 New.df.7vsNO 有一个名为 Genes 的列,我希望能够在其中显示我的数据中前 10 个上调和下调的基因。我不确定如何编写它,以便它可以在图表上过滤掉它。此外,我想实际标记图表上的数据点,我认为 label = Genes 至少会显示一些基因的名称。

ggplot(New.df.7vsNO, aes(x = log2FC, y = logpv, col = diffexpressed, label = Genes)) + geom_point() + theme_minimal() + scale_color_manual(values = c("blue", "red", "black")) + geom_vline(xintercept=c(-1.6, 1.6), col="red") +
        geom_hline(yintercept=-log10(0.05), col="red") 
structure(list(log2FC = c(2.5576, -1.7629, 4.5593, -1.6414, 4.7747, 
1.9217, 2.5951, -2.4236, 4.2056, -2.8089, -2.1215, -1.7551, 7.6618, 
1.9732, 1.768, -1.7532, 2.1137, -7.4119, -5.0595, -1.6435), logpv = c(6.23062267392386, 
2.4454139371159, 6.87289520163519, 2.41294040382783, 9.84466396253494, 
3.31880400398931, 5.49214412830417, 5.38090666937326, 10.3914739664228, 
7.39254497678533, 4.19928292171762, 2.43023996241365, 3.67370511218151, 
3.17656489822122, 2.45950785169463, 2.70542356079838, 3.13990167030148, 
3.04151256697968, 14.8041003475908, 2.43438827509794), diffexpressed = c("UP", 
"DOWN", "UP", "DOWN", "UP", "UP", "UP", "DOWN", "UP", "DOWN", 
"DOWN", "DOWN", "UP", "UP", "UP", "DOWN", "UP", "DOWN", "DOWN", 
"DOWN"), Genes = c("Ngfr", "Axin2", "Igsf5", "Dlat", "Scnn1g", 
"Ckmt1", "Tmprss2", "Pparg", "Sema4f", "Hk2", "Pxmp4", "Scn4a", 
"Slc13a2", "Timp1", "Uhrf1", "Cnn1", "Ube2c", "Rhbg", "Tmem79", 
"Cyp51")), row.names = c(NA, 20L), class = "data.frame")

【问题讨论】:

  • 为了帮助我们帮助您,您能否通过共享数据样本来重现您的问题?只需键入例如dput(head(NAME_OF_DATASET, 20)) 到控制台,它给出了你的 df 的前 20 行,然后复制并粘贴以 structure(.... 开头的输出到你的帖子中。
  • 顺便说一句:......你必须添加一个geom_text 来绘制标签。
  • 嗨 Stefan,让我知道这是否有效。谢谢。
  • 是的。工作。 (;

标签: r ggplot2


【解决方案1】:

您可以通过制作例如获得顶级基因的数据框。使用dplyr::top_n。而不是前 10 名,我将前 3 名用于示例目的。另外,对基因不太了解,所以我选择了logpv作为权重变量。

然后可以在第二个geom_point 中使用此数据框,我选择了更大的尺寸。

为了获得标签,我选择了ggrepel::geom_text_repel,它尽最大努力避免标签重叠:

library(ggplot2)
library(dplyr)

topdown <- New.df.7vsNO %>% 
  group_by(diffexpressed) %>% 
  dplyr::top_n(3, wt = logpv)

ggplot(New.df.7vsNO, aes(x = log2FC, y = logpv, col = diffexpressed, label = Genes)) + 
  geom_point() + 
  ggrepel::geom_text_repel(show.legend = FALSE) + 
  #geom_text(vjust = -.1, show.legend = FALSE) + 
  geom_point(data = topdown, size = 3, show.legend = FALSE) + 
  theme_minimal() + 
  scale_color_manual(values = c("blue", "red", "black")) + 
  geom_vline(xintercept=c(-1.6, 1.6), col="red") +
  geom_hline(yintercept=-log10(0.05), col="red") 

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 2020-12-30
    • 2020-04-08
    • 1970-01-01
    • 2021-08-24
    • 2020-10-12
    • 1970-01-01
    • 1970-01-01
    • 2022-12-04
    相关资源
    最近更新 更多