【发布时间】:2016-05-29 08:31:38
【问题描述】:
我正在尝试在包含多个类似 genbank 的条目的文件中获取最短和最长的序列。文件示例:
LOCUS NM_182854 2912 bp mRNA linear PRI 20-APR-2016
DEFINITION Homo sapiens mRNA.
ACCESSION NM_182854
SOURCE Homo sapiens (human)
ORGANISM Homo sapiens
Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi;
Mammalia; Eutheria; Euarchontoglires; Primates; Haplorrhini;
Catarrhini; Hominidae; Homo.
ORIGIN
1 gggcgatcag aagcaggtca cacagcctgt ttcctgtttt caaacgggga acttagaaag
61 tggcagcccc tcggcttgtc gccggagctg agaaccaaga gctcgaaggg gccatatgac
//
LOCUS NM_001323410 6992 bp mRNA linear PRI 20-APR-2016
DEFINITION Homo sapiens mRNA.
ACCESSION NM_001323410
SOURCE Homo sapiens (human)
ORGANISM Homo sapiens
Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi;
Mammalia; Eutheria; Euarchontoglires; Primates; Haplorrhini;
Catarrhini; Hominidae; Homo.
ORIGIN
1 actacttccg gcttccccgc cccgccccgt ccccgggcgt ctccattttg gtctcaggtg
61 tggactcggc aagaaccagc gcaagaggga agcagagtta tagctacccc ggc
//
我想从最短序列和最长序列中打印入藏号、生物的类型
到目前为止我的代码:
#!/usr/bin/perl
use strict;
use warnings;
print "enter file path\n";
while (my $line = <>){
chomp $line;
my @record = ($line);
foreach my $file(@record){
open(IN, "$file") or die "\n error opening file \n;/\n";
$/="//";
while (my $line = <IN>){
my @gb_seq = split ("ORIGIN", $line);
my $definition = $gb_seq[0];
my $sequence = $gb_seq[1];
$definition =~ m/ORGANISM[\s\t]+(.+)[\n\s\t]+/;
my $organism = $1;
if ($definition =~ m/ACCESSION[\s\t]+(\D\D_\d\d\d\d\d\d(\d*))[\n\s\t]+/){
my $accession = $1;
$sequence =~ s/\d//g;
$sequence =~ s/[\n\s\t]//g;
my $size = length($sequence);
my @sorted_keys = sort { $a <=> $b } keys my %size;
my $shortest = $sorted_keys[0];
my $longest = $sorted_keys[-1];
print "this is the shortest: $accession $organism size: $shortest\n";
print "this is the longest: $accession $organism size: $longest\n";
}
}}}
exit;
我想过将长度放入哈希中以获得最短和最长的,但那里有问题。我收到这些错误:
Use of uninitialized value $organism in concatenation (.) or string at test.pl line 39, <IN> chunk 1
Use of uninitialized value $shortest in concatenation (.) or string at test.pl line 39, <IN> chunk 1.
Use of uninitialized value $longest in concatenation (.) or string at test.pl line 40, <IN> chunk 1.
我应该改变哪一部分?谢谢
【问题讨论】:
-
我在数据中没有看到
ORGANISM。也许你的意思是ORIGIN? -
您的主要问题是您声明了一个新的空哈希 %size 用于排序命令,这与上面的 $size 标量无关。您需要在 while ($line) 循环上方声明 $biggest_sequence 和 $smallest_sequence 之类的内容,并为每个序列计算它是否应该取代旧的 $biggest_sequence 或 $smallest_sequence。
-
是的,抱歉,我剪掉了标题,因为它太大了,错过了有机体部分。
-
好的,我会努力的,谢谢
标签: perl