Perl - 文件迭代并寻找特定的匹配数据答案

【问题标题】：Perl - File Iteration and looking for specific matching dataPerl - 文件迭代并寻找特定的匹配数据
【发布时间】：2014-03-24 21:47:17
【问题描述】：

我希望在 Perl 中迭代一个文件，如果它找到一个特定的单词，然后存储匹配特定模式的其他行。 ldap.txt 文件在几个 Gig 中相当大。

user.txt

test1  
game

ldap.txt

dn: uid=test1,ou=people,dc=admin,dc=local  
blah  
blah  
maillocaladdress: test1@example.com  
maillocaladdress: test.team@example.com  
maillocaladdress: test11@example.com  
some date  
some more data  
data  
dn: uid=game,ou=people,dc=admin,dc=local   
blah  
blah  
maillocaladdress: game@example.com   
maillocaladdress: game.test@example.com  
maillocaladdress: game-test@example.com  
some date  
some more data  
data

等等……

打开 user.txt 并遍历每个用户并检查 dn: 行中 ldap.txt 上的每一行。如果匹配，则将匹配 maillocaladdress 的所有行的值存储到 varialbe ，我假设在哈希键/值 pari 中，但这里的值不止一个。

例如

test1 matches dn: uid=test1,ou=people,dc=admin,dc=local

为每个用户存储以下值。

test1@example.com  
test.team@example.com  
test11@example.com

代码

#! /usr/bin/perl

use strict;
use warnings;

my $ldiffile = shift;
my %emails;

open my $US, '<', 'users2.txt'
                  or die "Could not Open the file users2.txt: $!";

open my $FH, '<', $ldiffile
                 or die "Could not Open the file $ldiffile: $!";

chomp(my @users = <$US>);
#print "@users \n";

foreach my $uid (@users) {
print "$uid \n";
#       while ( chomp(my $line = <$FH>) ) {
        while (my $line = <$FH>) {
        chomp ($line);
                if ( $line =~ /dn: uid=$uid,ou=People,dc=admin,dc=local/i ) {
                print "$line \n";
                        if ( $line =~ /mailLocalAddress: ([\w\.\-\_\@]+)/ ) {
                                print "<<<< $line >>>> \n";
                                push ( @{$emails{$uid}}, $1 );
                        }
                }
        }
}

【问题讨论】：

您的user.txt 文件有多大？ - 没关系，我看到你已经把它加载到内存中了。
顺便问一下，您的实际问题是什么？什么不工作？
Several Gigabytes 可能需要很长时间才能逐行处理。
交叉发布于PerlMonks。

标签： regex perl

【解决方案1】：

散列用户列表。然后，遍历第二个文件。记住您当前正在解析的用户 ($user)。如果您看到电子邮件地址，请将其存储起来。

#!/usr/bin/perl
use warnings;
use strict;

my %users;
open my $USER, '<', 'user.txt' or die $!;
while (<$USER>) {
    s/\s*$//;               #/ Sample input contains trailing whitespace.
    undef $users{$_};
}

my $user = q();
open my $LDAP, '<', 'ldap.txt' or die $!;
while (<$LDAP>) {
    s/\s*$//;
    $user = $1 if /dn: uid=(.*?),ou=people,dc=admin,dc=local/;
    push @{ $users{$user} }, $1 if exists $users{$user} 
                                and /maillocaladdress: (.*)/;
}

for my $user (keys %users) {
    print "$user\n\t";
    print join "\n\t", @{ $users{$user} };
    print "\n";
}

【讨论】：

我怀疑这条线会过于宽松：$user = $1 if /dn: uid=(.*?),ou=people,dc=admin,dc=local/;。可能出现与其他要求不匹配的uid。因此，这将声明很多 maillocaladdress 值，这些值实际上与您绑定的 uid 无关。
@Miller：我只是使用了问题中的表达式（没有/i）。它可以根据数据进行更改，例如到dn: uid=(.*?),\S+=

【解决方案2】：

您的程序中可能存在一些缺陷。您正在尝试为每个 @users 迭代文件，但您只是为第一个用户循环文件。

您应该做的只是遍历文件并提取用户 ID 并将它们与您接受的用户列表进行匹配。以下应该做你想做的事：

#!/usr/bin/perl

use strict;
use warnings;
use autodie;

open my $US, '<', 'users2.txt';
chomp(my @users = <$US>);
close $US;
my %isuser = map {$_ => 1} @users;

my %emails;

my $userid = '';
while (<>) {
    chomp;
    if (/^dn: uid=([^,]*)/) {
        $userid = $1;
        $userid = '' if !/,ou=People,dc=admin,dc=local/;

    } elsif ($isuser{$userid}) {
        if (/mailLocalAddress: ([\w.-_@]+)/i) {
            print "$userid - <<<< $_ >>>> \n";
            push @{$emails{$userid}}, $1;
        }
    }
}

此外，您用于测试 mailLocalAddress 的正则表达式具有大写字母，而您的示例数据没有。因此在正则表达式上放置一个/i 标志。

【讨论】：