【问题标题】:extract gene-id + function annotation from .gff从 .gff 中提取基因 ID + 功能注释
【发布时间】:2014-11-22 16:13:51
【问题描述】:

我有一个如下所示的 .gff 文件:

Niben044Scf00000988 .   contig  1   120868  .   .   .   ID=Niben044Scf00000988;Name=Niben044Scf00000988
Niben044Scf00000988 maker   gene    6221    8457    .   -   .   ID=NbS00000988g0019;AltID=maker-Niben044Scf00000988-augustus-gene-0.18;Name=NbS00000988g0019;PredictionNote=maker-augustus
Niben044Scf00000988 maker   mRNA    6221    8457    .   -   .   ID=NbS00000988g0019.1;Parent=NbS00000988g0019;AltID=maker-Niben044Scf00000988-augustus-gene-0.18-mRNA-1;Name=NbS00000988g0019.1;_AED=0.07;_QI=0|1|1|1|1|1|3|43|341;_eAED=0.07;blast_hits=TAIR:AT3G28470.1:I57.93:L145:E8e-43,SWP:MYB38_MAIZE:I46.92:L130:E1e-29,GB:CAN75378.1:I45.28:L360:E1e-77,ITAG:Solyc03g113530.2.1:I74.03:L362:E4e-155;func_annotation="MYB transcription factor [Solanum lycopersicum]"

....

我需要属于它们的基因 ID 和功能注释。 我想在 R 中使用它。

gffread 似乎只能提取序列。

输出应如下所示:

Gene-ID \t functionannotation

NbS00000988g001 \t MYB transcription factor [Solanum lycopersicum]

bioperl 中是否有任何工具或迷你脚本?

【问题讨论】:

    标签: perl annotations bioperl


    【解决方案1】:

    您可以为此使用use Bio::FeatureIO。 下面是您的数据示例:

    use strict;
    use warnings;
    use Bio::FeatureIO;
    
    # read infile "my.gff"
    my $in  = Bio::FeatureIO->new(-file => "my.gff" , -format => 'GFF');
    
    # write to outfile "out.txt"
    open(my $fh, '>', 'out.txt') or die $!;
    print $fh "Gene-ID\tfunctionannotation\n";
    
    while ( my $feature = $in->next_feature() ) {
            my ($func) = $feature->annotation()->get_Annotations('func_annotation');
            print $fh $feature->seq_id . "\t" . $func->value . "\n" if $func;
    }
    

    文件 out.txt:

    Gene-ID functionannotation
    Niben044Scf00000988 MYB transcription factor [Solanum lycopersicum]
    

    【讨论】:

      猜你喜欢
      • 2018-03-24
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2013-08-18
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2018-02-28
      相关资源
      最近更新 更多