将行与两个文件中的模式进行比较答案

【问题标题】：compare lines with a pattern in two files将行与两个文件中的模式进行比较
【发布时间】：2019-02-19 21:53:22
【问题描述】：

举个例子

如果我的数据集是这样的。 LOG 1 (x.log) 包含

INFO @1102266 PHResourceLayer_Z4: mti_clk_chk:################ start of test ################ ; T=1102266
INFO @1102334 PHResourceLayer_Z4: mti_clk_chk:Checking the period of MTI, MTI10 clk from SV; T=1102334

LOG 2 (y.log) 包含

UVM_INFO @1092507 reporter Z4_COREA: mti_clk_chk: ################ start of test ################ ; T=1092507
UVM_INFO @1092563 reporter Z4_COREA: mti_clk_chk: Checking the period of MTI, MTI10 clk from SV; T=1092563

那么对于第一行我必须检查

################ start of test ################ ; T=1102266

和

################ start of test ################ ; T=1092507

因为 T 的值不一样，所以它应该在输出文件中给出这些细节，说明细节不匹配。

同样地，对于第 2 行，我们必须匹配

Checking the period of MTI, MTI10 clk from SV; T=1102334

和

Checking the period of MTI, MTI10 clk from SV; T=1092563

这里 T 的值也不匹配，所以将它传递给输出文件。

我必须逐行比较具有特定关键字mti_clk_chk 的两个日志文件中的详细信息。到现在为止，我可以将两个文件中的 required 关键字逐行解析为第三个文件。现在我想比较冒号之后的关键字（:）和输出文件中存在的数据，我必须打印第一个数据集中的不匹配行，以便将其与第二个数据集和第二个数据集中的行数进行比较第一个数据集中不存在的。下面给出解析两个日志文件后的数据。请帮助我如何比较两组数据之间每行提供的详细信息。

open(FILE, "<x.log");
my @array = <FILE>;
close(FILE);

open(FILE, "<y.log");
my @array1 = <FILE>;
close(FILE);

open(FILE, ">>file.txt");
my @array2 = <FILE>;
foreach $_ (@array & @array1) {
         @array2 = grep {$_ =~ "mti_clk_chk:"} (@array);
                  print FILE "@array2";
          print FILE "\n \n \n";

         @array2 = grep {$_ =~ "mti_clk_chk:"} (@array1);
                  print FILE "@array2";
                  close(FILE);
exit;
}

解析两个输入日志（x.log和y.log）后file.txt中的样本数据

INFO @576892 mti_clk_chk: run_stimulus called; T=576892
INFO @1102266 PHResourceLayer_Z4: mti_clk_chk:################ start of test ################ ; T=1102266
INFO @1102334 PHResourceLayer_Z4: mti_clk_chk:Checking the period of MTI, MTI10 clk from SV; T=1102334
INFO @1102372 mti_clk_chk: Checking period of MTI CLk; T=1102372
INFO @1102377 mti_clk_chk: Period value of MTI Clock: 3.125000 ns; T=1102377
INFO @1102377 mti_clk_chk: MTI Clock is being generated correctly ; T=1102377
INFO @1102377 mti_clk_chk: Checking period of MTI10 CLk; T=1102377
INFO @1102418 mti_clk_chk: Period value of MTI10 Clock: 31.250000 ns; T=1102418
INFO @1102418 mti_clk_chk: MTI10 Clock is being generated correctly ; T=1102418
INFO @1102717 PHResourceLayer_Z4: mti_clk_chk: All clock period Checking done; T=1102717
INFO @1148661 mti_clk_chk: C-Code exit execution. code=<aa>; T=1148661
INFO @1148661 mti_clk_chk: ************************ SV END******************** ; T=1148661

UVM_INFO @0 reporter testbench.top_level_module.\mti_clk_chk::main : MTI_CLK_CHK_STIM Started .....; T=0
UVM_INFO @0 reporter testbench.top_level_module.\mti_clk_chk::main : run_stimulus called; T=0
UVM_INFO @1092507 reporter Z4_COREA: mti_clk_chk: ################ start of test ################ ; T=1092507
UVM_INFO @1092563 reporter Z4_COREA: mti_clk_chk: Checking the period of MTI, MTI10 clk from SV; T=1092563
UVM_INFO @1092598 reporter testbench.top_level_module.\mti_clk_chk::main : Checking period of MTI CLk; T=1092598
UVM_INFO @1092605 /proj/rru2_verif/usr/Tilak/SV_UVM/testbench/data_ipdss/v_ms_mti_stim_vip/testbench/classes_v/mti_clk_chk.sv(147) uvm_test_top.default_env.default_sequencer100@@mti_clk_chk mti_clk_chk:INFO: Period value of MTI Clock: 3.125000 ns; T=1092605
UVM_INFO @1092605 reporter testbench.top_level_module.\mti_clk_chk::main : MTI Clock is being generated correctly ; T=1092605
UVM_INFO @1092605 reporter testbench.top_level_module.\mti_clk_chk::main : Checking period of MTI10 CLk; T=1092605
UVM_INFO @1092655 /proj/rru2_verif/usr/Tilak/SV_UVM/testbench/data_ipdss/v_ms_mti_stim_vip/testbench/classes_v/mti_clk_chk.sv(165) uvm_test_top.default_env.default_sequencer100@@mti_clk_chk mti_clk_chk:INFO: Period value of MTI10 Clock: 31.250000 ns; T=1092655
UVM_INFO @1092655 reporter testbench.top_level_module.\mti_clk_chk::main : MTI10 Clock is being generated correctly ; T=1092655
UVM_INFO @1092850 reporter Z4_COREA: mti_clk_chk: All clock period Checking done; T=1092850
UVM_INFO @1092886 /proj/rru2_verif/usr/Tilak/SV_UVM/testbench/data_ipdss/v_ms_mti_stim_vip/testbench/classes_v/mti_clk_chk.sv(186) uvm_test_top.default_env.default_sequencer100@@mti_clk_chk mti_clk_chk:INFO: ************************ SV END******************** ; T=1092886

【问题讨论】：

我编辑了您的问题以使输入和输出更具可读性。请检查我是否无意中删除了一些相关信息。

标签： perl

【解决方案1】：

如果我正确理解了你想要的输入数据

从文件 1 中读取行
- 过滤包含过滤关键字mti_clk_chk:的行
- 存储关键字后的所有内容以进行比较
与文件 2 相同
从文件 1 中打印出在文件 2 中找不到比较字符串的行
文件 2 反之亦然

针对您的问题提出的解决方案：

#!/usr/bin/perl
use warnings;
use strict;
use autodie;
use feature qw(say);

die "usage: $0 <log1> <log2>\n"
    if @ARGV < 2;
my($log1, $log2) = @ARGV;

# log file extractor function
sub extractor($) {
    my($file) = @_;
    my %lines;
    my @order;

    # Parse log file contents
    open(my $fh, '<', $file);
    while (<$fh>) {
        chomp;
        if (my($key) = /mti_clk_chk:\s*(.+)$/) {
            die "duplicate log line '$_' detected at ${file}:$.!\n"
                if exists $lines{$key};
            $lines{$key} = $_;
            push(@order, $key);
        }
    }
    close($fh);

    return((\%lines, \@order));
}

# parse log files
my($lines_log1, $order_log1) = extractor($log1);
my($lines_log2, $order_log2) = extractor($log2);

# lines in log1 but not in log2
say foreach (
    map  { $lines_log1->{$_} }
    grep { ! exists $lines_log2->{$_} }
    @{ $order_log1 }
);

# separator in output
say "";

# lines in log2 but not in log1
say foreach (
    map  { $lines_log2->{$_} }
    grep { ! exists $lines_log1->{$_} }
    @{ $order_log2 }
);

exit 0;

使用您给出的两行作为示例进行测试。我在开头和结尾添加了一些垃圾，以确保它不会出现在所需的输出中。

$ cat dummy1.txt
test1
INFO @1102266 PHResourceLayer_Z4: mti_clk_chk:################ start of test ################ ; T=1102266
INFO @1102334 PHResourceLayer_Z4: mti_clk_chk:Checking the period of MTI, MTI10 clk from SV; T=1102334
test1

$ cat dummy2.txt
test2
UVM_INFO @1092507 reporter Z4_COREA: mti_clk_chk: ################ start of test ################ ; T=1092507
UVM_INFO @1092563 reporter Z4_COREA: mti_clk_chk: Checking the period of MTI, MTI10 clk from SV; T=1092563
test2

$ perl dummy.pl dummy1.txt dummy2.txt
INFO @1102266 PHResourceLayer_Z4: mti_clk_chk:################ start of test ################ ; T=1102266
INFO @1102334 PHResourceLayer_Z4: mti_clk_chk:Checking the period of MTI, MTI10 clk from SV; T=1102334

UVM_INFO @1092507 reporter Z4_COREA: mti_clk_chk: ################ start of test ################ ; T=1092507
UVM_INFO @1092563 reporter Z4_COREA: mti_clk_chk: Checking the period of MTI, MTI10 clk from SV; T=1092563

【讨论】：

非常感谢@StefanBecker，您的代码工作正常，但我需要在单独的输出文件中打印数据，它应该是我们正在比较的相同数据（在关键字之后）而不是完整的线。其次，文件 1 的另一件事：打印与文件 2 不匹配的数据，但在比较文件 2 时，我只需要数据不匹配时的实例数（行数）
(a) 只使用perl script.pl log1.txt log2.txt >output.txt，(b) 你的例子表明输出文件应该包含整行，而不是比较字符串，(c) 再次，你的例子表明打印 not-found从 log1 到 log2 的行，然后从 log2 到 log1 的未找到行。答案只能基于问题中提出的输入、问题描述和预期输出。
如果它满足原始问题，请投票并接受答案。您在评论中描述的内容听起来像是您可以根据我的答案中的代码自己弄清楚的东西，即恕我直言，它不需要另一个问题。
perl script.pl log1.txt log2.txt >output.txt 这不能正常工作，请帮助我打印比较字符串，而不是在单独的输出文件中打印整行。
您能否更具体地说明“什么不起作用”？有错误信息吗？ >output.txt 将 perl 脚本的标准输出重定向到文件中，即脚本打印的所有内容都进入该文件。这应该适用于任何操作系统上的每个命令行工具。使用您的问题 perl script.pl x.log y.log >file.txt 中的文件名和存储在 script.pl 中的 Perl 代码