【问题标题】:Compare two genomic ranges (R)比较两个基因组范围 (R)
【发布时间】:2021-03-16 14:12:30
【问题描述】:

我有 2 个基因组范围

g1<-GRanges(c("chr1:0-14","chr1:15-29"), score=c(20.2,10.4));g1

GRanges object with 2 ranges and 1 metadata column:
   seqnames    ranges strand |     score
      <Rle> <IRanges>  <Rle> | <numeric>
[1]     chr1      0-14      * |      20.2
[2]     chr1     15-29      * |      10.4

g2<-GRanges(c("chr1:0-9","chr1:10-19","chr1:20-29"), state=c('E1','E2','E1'));g2

GRanges object with 3 ranges and 1 metadata column:
   seqnames    ranges strand |       state
      <Rle> <IRanges>  <Rle> | <character>
[1]     chr1       0-9      * |          E1
[2]     chr1     10-19      * |          E2
[3]     chr1     20-29      * |          E1

我想让它们具有可比性。首先我将它们组合起来,然后我使用了 disjoin:

g3<-(c(g1,g2)); g3 

GRanges object with 5 ranges and 2 metadata columns:
    seqnames    ranges strand |     score       state
       <Rle> <IRanges>  <Rle> | <numeric> <character>
 [1]     chr1      0-14      * |      20.2        <NA>
 [2]     chr1     15-29      * |      10.4        <NA>
 [3]     chr1       0-9      * |      <NA>          E1
 [4]     chr1     10-19      * |      <NA>          E2
 [5]     chr1     20-29      * |      <NA>          E1

disjoin(g3)
                                                                                                   
 GRanges object with 4 ranges and 0 metadata columns:
   seqnames    ranges strand
      <Rle> <IRanges>  <Rle>
[1]     chr1       0-9      *
[2]     chr1     10-14      *
[3]     chr1     15-19      *
[4]     chr1     20-29      *

所以,disjoin 正在做我想要的拆分,但不幸的是没有保留元数据。有没有办法像这样保留元数据并获取 GRange?

 GRanges object with 5 ranges and 2 metadata columns:
   seqnames    ranges strand |     score       state
      <Rle> <IRanges>  <Rle> | <numeric> <character>
[1]     chr1       0-9      *| 20.2    E1
[2]     chr1     10-14      *| 20.2   E2
[3]     chr1     15-19      *| 10.4   E2
[4]     chr1     20-29      *| 10.4   E1

谢谢

【问题讨论】:

    标签: r compare genomicranges


    【解决方案1】:

    我想你会在这里找到帮助:https://support.bioconductor.org/p/82551/ 但是请注意,在您的情况下它并不准确,因为输出中的范围可以映射到输入中的多个范围

    【讨论】:

      【解决方案2】:

      是的,with.revmap=T 绝对是解决方案:

      g1<-GRanges(c("chr1:0-14","chr1:15-29"), score=c(20.2,10.4));g1
      g2<-GRanges(c("chr1:0-9","chr1:10-19","chr1:20-29"), 
      state=c('E1','E2','E1'));g2
      g3<-(c(g1,g2)); g3 #combining GRanges
      g4<-disjoin(g3, with.revmap=TRUE);g4 #disjoining to compare them WITH revmap
      l1<-g4$revmap;l1 
      score<-extractList(mcols(g3)$score, l1);score 
      state<-extractList(mcols(g3)$state, l1);state
      na.omit<-function(l){sapply(l, function(x){x[!is.na(x)]})} #remove NA's
      mcols(g4)$score<-na.omit(score)
      mcols(g4)$state<-na.omit(state)
      g4
      
      GRanges object with 4 ranges and 3 metadata columns:
         seqnames    ranges strand |        revmap     score       state
            <Rle> <IRanges>  <Rle> | <IntegerList> <numeric> <character>
      [1]     chr1       0-9      * |           1,3      20.2          E1
      [2]     chr1     10-14      * |           1,4      20.2          E2
      [3]     chr1     15-19      * |           2,4      10.4          E2
      [4]     chr1     20-29      * |           2,5      10.4          E1
      

      现在我可以轻松地将状态与分数进行比较,例如做箱线图。 谢谢巴斯蒂安

      【讨论】:

        猜你喜欢
        • 2017-07-02
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 2015-06-25
        • 1970-01-01
        • 2021-11-11
        • 2014-05-15
        • 1970-01-01
        相关资源
        最近更新 更多