从 Perl 中的哈希中搜索子字符串匹配答案

【问题标题】：Searching for a substring match from a hash in Perl从 Perl 中的哈希中搜索子字符串匹配
【发布时间】：2011-07-14 21:56:32
【问题描述】：

我有一个文件，其中包含我需要在给定字符串中匹配的子字符串。这些给定的字符串取自另一个具有实际数据的文件。这是 csv 文件中的一列。如果给定的字符串具有这些子字符串中的任何一个，它将被标记为 TRUE。 Perl 最好的方法是什么？

到目前为止，我所做的就是这样。似乎还有一些问题：

#!/usr/bin/perl

use warnings;
use strict;

if ($#ARGV+1 != 1) {
 print "usage: $0 inputfilename\n";
 exit;
}

our $inputfile = $ARGV[0];
our $outputfile = "$inputfile" . '.ads';
our $ad_file = "C:/test/easylist.txt";  
our %ads_list_hash = ();

our $lines = 0;

# Create a list of substrings in the easylist.txt file
 open ADS, "$ad_file" or die "can't open $ad_file";
 while(<ADS>) {
        chomp;
        $ads_list_hash{$lines} = $_;
        $lines ++;
 }  

 for(my $count = 0; $count < $lines; $count++) {
            print "$ads_list_hash{$count}\n";
       }
 open IN,"$inputfile" or die "can't open $inputfile";       
 while(<IN>) {      
       chomp;       
       my @hhfile = split /,/;       
       for(my $count = 0; $count < $lines; $count++) {
            print "$hhfile[10]\t$ads_list_hash{$count}\n";

            if($hhfile[9] =~ /$ads_list_hash{$count}/) {
                print "TRUE !\n";
                last;
            }
       }
 }

 close IN;

【问题讨论】：

@Ed 我已经把我做的代码。但是仍然存在一些错误。但它有很多错误。

标签： regex perl hash data-mapping

【解决方案1】：

参见Text::CSV - 逗号分隔值操纵器，如

use 5.010;
use Text::CSV;
use Data::Dumper;
my @rows;
my %match;
my @substrings = qw/Hello Stack overflow/;
my $csv = Text::CSV->new ( { binary => 1 } )  # should set binary attribute.
                 or die "Cannot use CSV: ".Text::CSV->error_diag ();
open my $fh, "<:encoding(utf8)", "test.csv" or die "test.csv: $!";
while ( my $row = $csv->getline( $fh ) ) {
        if($row->[0] ~~ @substrings){ # 1st field 
            say "match " ;
            $match{$row->[0]} = 1;
        }
 }
$csv->eof or $csv->error_diag();
close $fh;
print Dumper(\%match);

【讨论】：

【解决方案2】：

您可以使用 selectcol_arrayref 或 fetchrow_* 和一个循环来获取要搜索的单词数组。然后通过使用 '\b)|(?:\b' 加入该数组并使用 '(?:\b' 和 '\b)' （或更适合您需要的东西）来构建正则表达式模式。

【讨论】：

【解决方案3】：

这里有一些经过清理的代码，它们的作用与您发布的代码相同，只是在测试之前它不会将$hhfile[10] 与每个广告模式一起打印；如果您需要该输出，那么您将不得不遍历所有模式并以与您已经在做的基本相同的方式单独测试每个模式。（虽然，即使在这种情况下，如果你的循环是 for my $count (0 .. $lines) 而不是 C 风格的 for (...;...;...) 会更好。）

我没有单独测试每个模式，而是使用Regexp::Assemble，它将构建一个单一模式，相当于一次测试所有单独的子字符串。 Nikhil Jain 的答案中的智能匹配运算符 (~~) 在使用时与他的答案中显示的基本相同，但它需要 Perl 5.10 或更高版本，而 Regexp::Assemble 仍然可以为您工作5.8 或（天堂禁止！）5.6。

#!/usr/bin/env perl

use warnings;
use strict;

use Regexp::Assemble;

die "usage: $0 inputfilename\n" unless @ARGV == 1;

my $inputfile     = $ARGV[0];
my $outputfile    = $inputfile . '.ads';
my $ad_file       = "C:/test/easylist.txt";
my @ad_list;

# Create a list of substrings in the easylist.txt file
open my $ads_fh, '<', $ad_file or die "can't open $ad_file: $!";
while (<$ads_fh>) {
    chomp;
    push @ad_list, $_;
}

for (@ad_list) {
    print "$_\n";       # Or just "print;" - the $_ will be assumed
}      

my $ra = Regexp::Assemble->new;
$ra->add(@ad_list);

open my $in_fh, '<', $inputfile or die "can't open $inputfile: $!";
while (<$in_fh>) {
    my @hhfile = split /,/;
    print "TRUE !\n" if $ra->match($hhfile[9]);
}

（根据perl -c，代码在语法上是有效的，但除此之外还没有经过测试。）

【讨论】：