我认为你对效率的尝试实际上是在减慢速度。
my %listA;
# Read first file (name in $NameA)
{
open my $fileA, '<', "$NameA" or die $!;
while (<$fileA>)
{
chomp;
$listA{$_}++;
}
}
# Read second file (name in $NameB)
{
open my $fileB, '<', "$NameB" or die $!;
while (<$fileB>)
{
chomp;
if ($listA{$_})
{
print "Line appears in $NameB once and $listA{$_} times in $NameA: $_\n";
}
}
}
如果您也想将第二个文件读入哈希,那么这也可以:
现在,如果两个文件中都出现了特定的行,它将被列出。请注意,即使我按排序顺序显示键,我也使用哈希查找,因为这样会更快地通过两个排序数组进行混洗。当然,您很难衡量 4 行文件的任何差异。对于大文件,读取文件和打印结果的 I/O 时间可能会支配查找时间。
my %listB;
# Read second file (name in $NameB)
{
open my $fileB, '<', "$NameB" or die $!;
while (<$fileB>)
{
chomp;
$listB{$_}++;
}
}
foreach my $key (sort keys %listA)
{
if ($listB{$key})
{
print "$NameA: $listA{$key}; $NameB: $listB{$key}; $key\n";
}
}
根据需要重新组织输出。
未经测试的代码!现已测试的代码 - 见下文。
转换为测试代码
数据:文件A
hello
hi
tired
sleepy
数据:文件B
hi
tired
sleepy
hello
程序:ppp.pl
#!/usr/bin/env perl
use strict;
use warnings;
my $NameA = "fileA";
my $NameB = "fileB";
my %listA;
# Read first file (name in $NameA)
{
open my $fileA, '<', "$NameA" or die "Failed to open $NameA: $!\n";
while (<$fileA>)
{
chomp;
$listA{$_}++;
}
}
# Read second file (name in $NameB)
{
open my $fileB, '<', "$NameB" or die "Failed to open $NameB: $!\n";
while (<$fileB>)
{
chomp;
if ($listA{$_})
{
print "Line appears in $NameB once and $listA{$_} times in $NameA: $_\n";
}
}
}
输出
$ perl ppp.pl
Line appears in fileB once and 1 times in fileA: hi
Line appears in fileB once and 1 times in fileA: tired
Line appears in fileB once and 1 times in fileA: sleepy
Line appears in fileB once and 1 times in fileA: hello
$
请注意,这是按 fileB 的顺序列出的,因为它应该考虑到循环读取 fileB 并依次检查每一行。
代码:qqq.pl
这是变成完整工作程序的第二个片段。
#!/usr/bin/env perl
use strict;
use warnings;
my $NameA = "fileA";
my $NameB = "fileB";
my %listA;
# Read first file (name in $NameA)
{
open my $fileA, '<', "$NameA" or die "Failed to open $NameA: $!\n";
while (<$fileA>)
{
chomp;
$listA{$_}++;
}
}
my %listB;
# Read second file (name in $NameB)
{
open my $fileB, '<', "$NameB" or die "Failed to open $NameB: $!\n";
while (<$fileB>)
{
chomp;
$listB{$_}++;
}
}
foreach my $key (sort keys %listA)
{
if ($listB{$key})
{
print "$NameA: $listA{$key}; $NameB: $listB{$key}; $key\n";
}
}
输出:
$ perl qqq.pl
fileA: 1; fileB: 1; hello
fileA: 1; fileB: 1; hi
fileA: 1; fileB: 1; sleepy
fileA: 1; fileB: 1; tired
$
请注意,键是按排序顺序列出的,而不是 fileA 或 fileB 中的顺序。
小奇迹偶尔会发生!除了添加 5 行序言(shebang、2 x using、2 x my)之外,根据我对这两个程序的第一次估算,这两个程序片段的代码都是正确的。 (哦,我改进了无法打开文件的错误消息,至少确定了我无法打开哪个文件。ikegami 编辑了我的代码(谢谢!)以一致地添加chomp 调用,并将换行符添加到print 操作现在需要显式换行符。)
我不会说这是很棒的 Perl 代码;它肯定不会赢得(代码)高尔夫比赛。不过,它似乎确实有效。
问题代码分析
open BASE_CONFIG_FILE, "< script/base.txt" or die;
my %base_config;
while (my $line=<BASE_CONFIG_FILE>) {
(my $word1,my $word2) = split /\n/, $line;
$base_config{$word1} = $word1;
}
拆分很奇怪...您有一行以换行符结尾,并且您在换行符处拆分,因此$word2 为空,$word1 包含该行的其余部分。然后将值$word1(不是我乍一看以为的$word2)存储到基本配置中。因此,每个条目的键和值都是相同的。异常。实际上并没有错,但是……不寻常。第二个循环本质上是相同的(我们都应该因为没有使用单个潜艇为我们做阅读而被枪杀)。
您不能使用use strict; 和use warnings; - 请注意,实际上我对代码所做的第一件事就是添加它们。我只用 Perl 编程了大约 20 年,而且我知道我的知识不足以冒险在没有它们的情况下运行代码。您的排序数组%common、$count、$num、$key、$value 不是my'd。这次可能不会造成太大的伤害,但是……这是一个不好的迹象。始终,但始终,使用use strict; use warnings;,直到您对 Perl 有足够的了解,无需提出任何问题(也不要指望很快)。
当我运行它的时候,有:
my %common={}; # line 32 - I added diagnostic printing
my $count=0;
Perl 告诉我:
Reference found where even-sized list expected at rrr.pl line 32, <CONFIG_FILE> line 4.
糟糕 - 那些 {} 应该是一个空列表 ()。看看你为什么在启用警告的情况下运行!
然后,在
50 while(my($key,$value)=each(%common))
51 {
52 print "key: ".$key."\n";
53 print "value: ".$value."\n";
54 }
Perl 告诉我:
key: HASH(0x100827720)
Use of uninitialized value $value in concatenation (.) or string at rrr.pl line 53, <CONFIG_FILE> line 4.
这是%common 中的第一个条目,用于循环扔东西。
固定代码:rrr.pl
#!/usr/bin/env perl
use strict;
use warnings;
#open base config file and load them into the base_config hash
open BASE_CONFIG_FILE, "< fileA" or die;
my %base_config;
while (my $line=<BASE_CONFIG_FILE>) {
(my $word1,my $word2) = split /\n/, $line;
$base_config{$word1} = $word1;
print "w1 = <<$word1>>; w2 = <<$word2>>\n";
}
{ print "First file:\n"; foreach my $key (sort keys %base_config) { print "$key => $base_config{$key}\n"; } }
#sort BASE_CONFIG_FILE
my @sorted_base_config = sort keys %base_config;
#open config file and load them into the config hash
open CONFIG_FILE, "< fileB" or die;
my %config;
while (my $line=<CONFIG_FILE>) {
(my $word1,my $word2) = split /\n/, $line;
$config{$word1} = $word1;
print "w1 = <<$word1>>; w2 = <<$word2>>\n";
}
#sort CONFIG_FILE
my @sorted_config = sort keys %config;
{ print "Second file:\n"; foreach my $key (sort keys %base_config) { print "$key => $base_config{$key}\n"; } }
my %common=();
my $count=0;
while(my($key,$value)=each(%config))
{
print "Loop: $key = $value\n";
my $num=keys(%base_config);
$num--;#to get the correct index
#print "$num\n";
while($num>=0)
{
#check if all the strings in BASE_CONFIG_FILE can be found in CONFIG_FILE
$common{$value}=$value if exists $base_config{$key};
#print "yes!\n" if exists $base_config{$key};
$num--;
}
}
print "count: $count\n";
while(my($key,$value)=each(%common))
{
print "key: $key -- value: $value\n";
}
my $num=keys(%common);
print "common lines: $num\n";
输出:
$ perl rrr.pl
w1 = <<hello>>; w2 = <<>>
w1 = <<hi>>; w2 = <<>>
w1 = <<tired>>; w2 = <<>>
w1 = <<sleepy>>; w2 = <<>>
First file:
hello => hello
hi => hi
sleepy => sleepy
tired => tired
w1 = <<hi>>; w2 = <<>>
w1 = <<tired>>; w2 = <<>>
w1 = <<sleepy>>; w2 = <<>>
w1 = <<hello>>; w2 = <<>>
Second file:
hello => hello
hi => hi
sleepy => sleepy
tired => tired
Loop: hi = hi
Loop: hello = hello
Loop: tired = tired
Loop: sleepy = sleepy
count: 0
key: hi -- value: hi
key: tired -- value: tired
key: hello -- value: hello
key: sleepy -- value: sleepy
common lines: 4
$