两个文件之间的模式匹配和处理答案

【问题标题】：Pattern matching and processing between two files两个文件之间的模式匹配和处理
【发布时间】：2015-08-01 02:59:08
【问题描述】：

我正在尝试找到在两个文件之间执行模式匹配和处理的所有不同方法。我是 perl 新手，需要一些帮助。

我有两个文件。 Colors.txt 看起来像这样：

Joe likes the color green.
Sam likes the color blue.

Pencils.txt 看起来像这样：

Pencil one is blue.
Pencil two is green.

我需要解析这两个文件，并打印以下内容：

Sam's pencil is number one because he likes the color blue.
Joe's pencil is number two because he likes the color green.

有人可以指导我如何有效地处理这个问题吗？

【问题讨论】：

标签： regex perl

【解决方案1】：

#!/usr/bin/perl

my $filecol = $ARGV[0]; # path to colors.txt
my $filepen = $ARGV[1]; # path to pencils.txt

die("USAGE: ./colpen.pl colors.txt pencils.txt") unless -f $filecol && -f $filepen;

my $colors; # hashref of numbers strings for each color string (e.g. $colors->{green} = "two")
my $l = ""; # line by line reading buffer

open(PEN, "<$filepen") or die("Cannot open \"$filepen\" : $!");
while(defined($l=<PEN>)){
    if( $l =~ /(\w+)\s+is\s+(\w+)/i ){ # matches every line that has "... Foo is Bar" while putting "Foo" and "Bar" in $1 and $2 respectively

        $colors->{$2} = $1;

    }
}
close(PEN);

open(COL, "<$filecol") or die("Cannot open \"$filecol\" : $!");
while(defined($l=<COL>)){
    if( $l =~ /^\s*(\w+)\s.*\scolor\s+(\w+)/i ){ # match in a flexible way a first word, then the next word after "color", and put them in $1 and $2

        if( $colors->{$2} ne "" ){
            print( sname($1)." pencil is number $colors->{$2} because he likes the color $2.\n");
        }else{
            print("I have no idea which pencil is ".sname($1).".\n");
        }

    }
}
close(COL);

exit;

sub sname {
    # Just an extra :) for names that end with an s.

    return($_[0]) if $_[0] =~ /s$/i;
    return("$_[0]\'s");

}

【讨论】：

非常感谢您的解释！ nebj00la
不客气。如果你愿意，你可以选择它作为答案并投票;) eheh
一个问题：如果有这样一种情况，两个人最喜欢的颜色一样，而那个颜色的铅笔只有一支呢？一个人会如何处理这个？一个额外的子程序？
真正的问题是：您希望它如何反应？您总是可以在阅读后删除一个键，使用 delete $colors->{$2} 但我不确定这会做你想要的，因为我不知道你想要什么:)
感谢您的回复。例如，colors.txt 会说“约翰喜欢蓝色”，而另一行会说“吉尔喜欢蓝色”。 Pencils.txt 会说“铅笔 30 是蓝色的”。我想要一些东西来处理“铅笔 30 是约翰的，铅笔 30 是吉尔斯的情况。我需要一个函数，据我所见，它可能是一个数组或散列，上面写着“1。所有匹配的颜色首先要与人匹配。 2. 发现重复。前任。铅笔 1 是乔的，因为他喜欢橙色。 \n 铅笔 2 是杰克的，因为他喜欢红色。约翰和吉尔都喜欢蓝色，所以铅笔 30 可能属于任何一个。

【解决方案2】：

从其中一个文件（任一文件）创建一个查找表，然后在处理另一个文件时查找所需的值。

my %pencil_by_color;

while (<$pencils_fh>) {
   my ($name, $color) = /^Pencil (\S+) is (\S+)\.$/
      or die("Syntax");

   !$pencil_by_color{$color}
      or die("Duplicate");

   $pencil_by_color{$color} = $name;
}

while (<$colors_fh>) {
   my ($name, $color) = /^(\S+) likes the color (\S+)\.$/
      or die("Syntax");

   my $pencil = $pencil_by_color{$color}
      or die("Missing");

   print("${name}'s pencil is number ${pencil} because he likes the color ${color}.\n");
}

【讨论】：

非常感谢您的解释！ nebj00la