将 .txt 文件放入哈希中并使用 perl 与单词数组进行比较 [关闭]答案

【问题标题】：put .txt files into a hash and compare with an array of words using perl [closed]将 .txt 文件放入哈希中并使用 perl 与单词数组进行比较 [关闭]
【发布时间】：2010-11-16 11:56:34
【问题描述】：

我有一个 .txt 文件的文件夹，我想将其存储在哈希中。然后将文件与特定单词数组进行比较。同时计算特定单词出现的次数。

【问题讨论】：

文件内容和比较hash的具体例子有哪些？你试过什么了？ SO 不是一个“请帮我做功课”的网站，这个问题看起来很像。
您的问题到底是什么？ :)
@Øyvind Skaar - 纯粹出于对 Perl 的好奇，如果您不介意，如何正确发音您的名字？
@DVK 带 Ø ;) 不知道英语中有没有听起来完全一样的东西.. 这个家伙 answers.yahoo.com/question/index?qid=20100101105722AAsWC6Y 建议在“鸟”和“伤害”中使用元音
@DVK, @Øyvind：也许有帮助：IPA for Swedish and Norwegian - 看起来类似于德语 ö。

标签： arrays perl text hash count

【解决方案1】：

请注意，我使用\p{Alpha}，因为它在技术上定义了一个词。您可以使用正则表达式添加数字或确保开头有一个 alpha 或您可能需要的任何内容。

还要注意，对于每行包含一个单词的文件，正则表达式是多余的，您应该省略它。只需chomp 在线和存储$_。

use 5.010; # for say
use strict;
use warnings;

my ( %hash );

sub load_words { 
    @hash{ @_ } = ( 0 ) x @_; return; 
}

sub count_words {
    $hash{$_}++ foreach grep { exists $hash{$_} } @_;
}


my $word_regex
    = qr{ (                # start a capture
            \p{Alpha}+     # any sequence of one or more alpha characters
            (?:            # begin grouping of
                ['-]         # allow hyphenated words and contractions
                \p{Alpha}+   # which must be followed by an alpha
            )*             # any number of times
            (?: (?<=s)')?  # case for plural possessives (ht: tchrist)
          )                # end capture
        }x;

# load @ARGV to do <> processing
@ARGV = qw( list of files I take words from );
while ( <> ) {
    load_words( m/$word_regex/g );
}
@ARGV = qw( list of files where I count words );
while ( <> ) { 
    count_words( m/$word_regex/g );
}

# take a look at the hash
say Data::Dumper->Dump( [ \%hash ], [ '*hash' ] );

【讨论】：

请参阅this answer 了解另一种基于单词的方法，该方法查看某些边界情况。
@tchrist：关于复数所有格的好点。 :D
我真的真的很高兴看到人们开始避免在他们的模式中写 [a-z]。这就像so 1960 年代！ ☹

【解决方案2】：

不会为您编写代码，但您可以执行以下操作：

循环所有文件（参见 glob()）
循环每个文件中的所有单词（可能使用正则表达式或 split()？）
根据所需单词的散列检查每个单词。如果存在，则增加一个“计数器”哈希值，如下所示： $hash{ $word }++ 或者您可以将所有单词存储在哈希中，然后获取您想要的单词..

或者...有很多方法可以做到这一点..

如果你的文件很大，你将不得不用另一种方式来做

【讨论】：

我的文件很小，所以应该可以工作...谢谢

【解决方案3】：

所以我完成了使用我想找到的特定单词的数组...HAPPY DAYS :-)

#!/usr/bin/perl
#use strict;
use warnings;
my @words;

my @triggers=(" [kK]ill"," [Aa]ssault", " [rR]ap[ie]"," [dD]rug");
my %hash;

sub count_words {
    print "\n";
}

my $word_regex
    = qr{ (                # start a capture
            \p{Alpha}+     # any sequence of one or more alpha characters
            (?:            # begin grouping of
                ['-]         # allow hyphenated words and contractions
                \p{Alpha}+   # which must be followed by an alpha
            )*             # any number of times
          )                # end capture
        }x;

my @files;
my $dirname = "/home/directory";
opendir(DIR,$dirname) or die "can't opendir $dirname: $!";
while (defined($file = readdir(DIR))) {
     push @files, "$dirname$file";
}    # do something with "$dirname/$file" } 
closedir(DIR);
my @interestingfiles;

foreach $file (@files){

    open FILE, ("<$file") or die "No file";

    foreach $line (<FILE>){
        foreach $trigger (@triggers){
           if($line =~ /$trigger/g){
              push @interestingfiles, "$file\n";
           }
        }
    } 
   close FILE;
}
print @interestingfiles;

【讨论】：

你为什么要评论use strict;？你应该永远这样做。修复它所揭示的问题。