如何使用 Perl 在命令提示符下显示两个文本文件的匹配和未匹配记录？答案

【问题标题】：How to show matching and Miss matching records of two text files in command prompt using Perl?如何使用 Perl 在命令提示符下显示两个文本文件的匹配和未匹配记录？
【发布时间】：2015-01-21 03:41:51
【问题描述】：

我正在使用两个文本文件 sampleA.txt 和 sampleB.txt。我在每个文件中有两个字段，我需要将 sampleA.txt 的第一条记录（第一行）与 sampleB.txt 的第一行进行比较，我想显示匹配记录以及在命令提示符下错过匹配记录。我需要在 Perl 中执行此操作。

使用下面的脚本，我得到了一个输出，但它是错误的。我需要填充匹配和不匹配。该怎么做？

sampleA.txt：

1|X

2|A

4|Z

5|A

sampleB.txt：

2|A

2|X

3|B

4|C

我得到的输出：

2|A

2|X

4|C

我想要的输出：

匹配输出：

2|A

未匹配输出：

1|X

4|Z

5|A

3|B

4|C

Perl 脚本：

#!/usr/bin/perl
use strict;
use warnings;

open(FILE1,'C:\Users\sathiya.kumar\Desktop\sampleA.txt') || die $!;
open(FILE2,'C:\Users\sathiya.kumar\Desktop\sampleB.txt') || die $!;

my $interline;
while (my $line= <FILE1>) {
    my @fields = split('\|',$line);
    parser($fields[0]);
}

sub parser {
    my $mergeid = shift;
    while (defined $interline || ($interline= <FILE2>)) {
        my @fields = split('\|',$interline);
        my $key  = $fields[0];
        if ($key lt $mergeid) {
                # Skip non-matching records
                $interline = undef;
                next;
            } elsif ($key gt $mergeid) {
                # wait for next key
                last;
            } else {
                print $interline;
                $interline = undef;
           }
      }
}
close(FILE1);
close(FILE2);

如果您需要更多信息，请告诉我。

【问题讨论】：

stackoverflow.com/questions/4891898/…

标签： perl

【解决方案1】：

你漏掉了2|X：

use strict; 
use warnings; 
use 5.016;
use Data::Dumper;

#Create a set from the entries in sampleA.txt:

my $fname = 'sampleA.txt';

open my $A_INFILE, '<', $fname
    or die "Couldn't open $fname: $!";

my %a;

while (my $line = <$A_INFILE>) {
    chomp $line;
    $a{$line} = undef;
}

close $A_INFILE;
say Dumper(\%a);

#Create a set from the entries in sampleB.txt:

$fname = 'sampleB.txt';

open my $B_INFILE, '<', $fname
    or die "Couldn't open $fname: $!";

my %b;

while (my $line = <$B_INFILE>) {
    chomp $line;
    $b{$line} = undef;
}

close $B_INFILE;
say Dumper(\%b);

#Divide the entries in both files into matches and mismatches:

my (@matches, @mismatches);

for my $a_val (keys %a) {
    if (exists $b{$a_val}) {
        push @matches, $a_val;
    }
    else {
        push @mismatches, $a_val;
    }
}

for my $b_val (keys %b) {
    if (not exists $a{$b_val}) {
        push @mismatches, $b_val;
    }
}

say Dumper(\@matches);
say Dumper(\@mismatches);

--output:--
$VAR1 = {
          '5|A' => undef,
          '4|Z' => undef,
          '1|X' => undef,
          '2|A' => undef
        };

$VAR1 = {
          '2|X' => undef,
          '3|B' => undef,
          '4|C' => undef,
          '2|A' => undef
        };

$VAR1 = [
          '2|A'
        ];

$VAR1 = [
          '5|A',
          '4|Z',
          '1|X',
          '2|X',
          '3|B',
          '4|C'
        ];

如果您在标量上下文中评估散列，如果散列为空，则返回 false。如果有任何键/值对，则返回 true；更准确地说，返回的值是一个字符串，由已使用的桶数和分配的桶数组成，用斜杠分隔。这仅在查明 Perl 的内部散列算法是否在您的数据集上表现不佳时非常有用。例如，您将 10,000 个东西放在哈希中，但在标量上下文中评估 %HASH 会显示 "1/16" ，这意味着十六个桶中只有一个被触及，并且可能包含所有 10,000 个项目。这不应该发生。如果在标量上下文中评估绑定哈希，则调用 SCALAR 方法（回退到 FIRSTKEY ）。

http://perldoc.perl.org/perldata.html

【讨论】：

是的！我遗漏了2|X。感谢您的回答。