【问题标题】:How to get all feature in a range from a GFF3 file in Perl?如何从 Perl 的 GFF3 文件中获取范围内的所有功能?
【发布时间】:2010-09-08 14:33:53
【问题描述】:

我想编写一个获取 GFF3 文件名和范围(即 100000 .. 2000000)的 Perl 函数。并返回对包含在此范围内找到的所有基因名称/加入的数组的引用。

我想使用 bioperl 会很有意义,但我对它的经验很少。我可以自己编写一个解析 GFF3 的脚本,但如果使用 bioperl(或其他软件包)不太复杂 - 我宁愿重用他们的代码。

【问题讨论】:

    标签: perl parsing bioperl


    【解决方案1】:
    use Bio::Tools::GFF;
    
    my $range_start = 100000;
    my $range_end   = 200000;
    
    my @features_in_range = ( );
    
    
    my $gffio = Bio::Tools::GFF->new(-file => $gff_file, -gff_version => 3);
    
    while (my $feature = $gffio->next_feature()) {
    
        ## What about features that are not contained within the coordinate range but
        ## do overlap it?  Such features won't be caught by this check.            
        if (
            ($feature->start() >= $range_start)
            &&
            ($feature->end()   <= $range_end)
           ) {
    
            push @features_in_range, $feature;
    
        }
    
    }
    
    $gffio->close();
    

    免责声明:幼稚的实现。我只是敲了它,它没有测试。我什至不保证它可以编译。

    【讨论】:

      【解决方案2】:

      您确实想为此使用BioPerl,可能使用Bio::Tools::GFF 模块。

      您真的应该在BioPerl mailing list 上提问。它非常友好,订阅者知识渊博——他们一定能帮助你。一旦你得到答案(如果你没有先在这里得到答案),我建议在这里用答案回答你自己的问题,这样我们都能受益!

      【讨论】:

        【解决方案3】:

        以下函数采用目标和范围的散列并返回一个函数,该函数将迭代与任何范围重叠的所有目标。目标应该是对引用数组的引用:

        my $targets =    
        [
          [
            $start,
            $end,
          ],
          ...,
        ]
        

        范围应该是对哈希数组的引用:

        my $ranges =
        [
          {
            seqname   => $seqname,
            source    => $source,
            feature   => $feature,
            start     => $start,
            end       => $end,
            score     => $score,
            strand    => $strand,
            frame     => $frame,
            attribute => $attribute,
          },
          ...,
        ]
        

        当然,您可以只传递一个目标。

        my $brs_iterator
        = binary_range_search( targets => $targets, ranges => $ranges );
        
        while ( my $gff_line = $brs_iterator->() ) {
           # do stuff
        }
        
        sub binary_range_search {
            my %options = @_;
        
            my $targets = $options{targets}  || croak 'Need a targets parameter';
            my $ranges  = $options{ranges} || croak 'Need a ranges parameter';
        
            my ( $low, $high ) = ( 0, $#{$ranges} );
            my @iterators = ();
        
        TARGET:
            for my $range (@$targets) {
        
            RANGE_CHECK:
                while ( $low <= $high ) {
        
                    my $try = int( ( $low + $high ) / 2 );
        
                    $low = $try + 1, next RANGE_CHECK
                        if $ranges->[$try]{end} < $range->[0];
                    $high = $try - 1, next RANGE_CHECK
                        if $ranges->[$try]{start} > $range->[1];
        
                    my ( $down, $up ) = ($try) x 2;
                    my %seen = ();
        
                    my $brs_iterator = sub {
        
                        if (    $ranges->[ $up + 1 ]{end} >= $range->[0]
                            and $ranges->[ $up + 1 ]{start} <= $range->[1]
                            and !exists $seen{ $up + 1 } )
                        {
                            $seen{ $up + 1 } = undef;
                            return $ranges->[ ++$up ];
                        }
                        elsif ( $ranges->[ $down - 1 ]{end} >= $range->[0]
                            and $ranges->[ $down - 1 ]{start} <= $range->[1]
                            and !exists $seen{ $down - 1 }
                            and $down > 0 )
                        {
                            $seen{ $down - 1 } = undef;
                            return $ranges->[ --$down ];
                        }
                        elsif ( !exists $seen{$try} ) {
                            $seen{$try} = undef;
                            return $ranges->[$try];
                        }
                        else {
                            return;
                        }
                    };
                    push @iterators, $brs_iterator;
                    next TARGET;
                }
            }
        
        # In scalar context return master iterator that iterates over the list of range iterators.
        # In list context returns a list of range iterators.
            return wantarray
                ? @iterators
                : sub {
                while (@iterators) {
                    if ( my $range = $iterators[0]->() ) {
                        return $range;
                    }
                    shift @iterators;
                }
                return;
                };
        }
        

        【讨论】:

          猜你喜欢
          • 1970-01-01
          • 2011-11-07
          • 2016-08-12
          • 1970-01-01
          • 1970-01-01
          • 1970-01-01
          • 2013-12-29
          • 2022-11-28
          • 1970-01-01
          相关资源
          最近更新 更多