【发布时间】:2017-04-12 20:05:34
【问题描述】:
我想将 FASTA 文件中的信息插入 MySQL 数据库的表中。我使用Ensembl_id 列作为主键。
我的一些Ensembl_id 不是唯一的,所以我尝试使用exists 运算符来解决这个问题。但是表中只插入了 5 行,其中只有 1 行具有重复的 Ensembl_id 值。
#!/usr/bin/perl -w
#usage script.pl <username> <password> <database_name> <mouse_genes> <mouse_transcripts>
use DBI;
use Data::Dumper;
my $user = shift @ARGV or die $!;
my $password = shift @ARGV or die $!;
my $database = shift @ARGV or die $!;
my $mouse_genes = shift @ARGV or die $!;
my $mouse_transcripts = shift @ARGV or die $!;
my $dbh = DBI->connect( "dbi:mysql:$database:localhost", "$user", "$password",
{ RaiseError => 1 } );
my %gene;
$/ = "\n>";
open( FILE, "gzip -d -c /data.dash/class2016/student/Mus_musculus.GRCm38.cdna.all.fa.gz |" )
or die $!;
LOOP:
while ( <FILE> ) {
my $line = $_;
chomp $line;
if ( $line =~ /[a-z]/ ) {
my @array = split( "\t", $line );
if ( m/gene:(\w+\d+\.\w+)/ ) {
my $Ensembl_id = $1;
if ( !exists $gene{$Ensembl_id} ) {
$gene{$Ensembl_id} = 1;
}
else {
next;
}
if ( m/gene_biotype:(\w+)/ ) {
my $gene_biotype = $1;
if ( m/gene_symbol:(\w+\D\d+)/ ) {
my $gene_symbol = $1;
if ( m/description:(\w+\s+\w+\s+\w+\s+)/ ) {
my $gene_description = $1;
if ( m/MGI:(\d+)/ ) {
my $MGI_accession = $1;
my $sth = $dbh->prepare(
qq{insert into $mouse_genes (Ensembl_id,gene_biotype,gene_symbol,gene_description,MGI_accession) values ("$Ensembl_id","$gene_biotype","$gene_symbol","$gene_description","$MGI_accession")}
);
$sth->execute();
$sth->finish();
next LOOP;
}
}
}
}
}
}
}
close FILE;
$dbh->disconnect();
如果主键$Ensembl_id 重复,我如何使用exists 运算符移动到文件的下一行?
【问题讨论】:
标签: perl hash exists perl-data-structures