为什么在while循环之后我只得到最后一行的值？答案

【问题标题】：Why after the while loop I am only getting last row value?为什么在while循环之后我只得到最后一行的值？
【发布时间】：2020-09-30 14:19:20
【问题描述】：

这是我正在阅读的文件，

#Log1
Time    Src_id  Des_id  Address
0   34  56  x9870
2   36  58  x9872
4   38  60  x9874
6   40  62  x9876
8   42  64  x9878

#Log2
Time    Src_id  Des_id  Address
1   35  57  x9871
3   37  59  x9873
5   39  61  x9875
7   41  63  x9877
9   43  65  x9879

这是我写的代码，我正在逐行阅读然后拆分它

#!usr/bin/perl
use warnings;
use strict;

my $log1_file = "log1.log";
my $log2_file = "log2.log";

open(IN1, "<$log1_file" ) or die "Could not open file $log1_file: $!";
open(IN2, "<$log2_file" ) or die "Could not open file $log2_file: $!";

my $i_d1;
my $i_d2;
my @fields1;
my @fields2;
while (my $line = <IN1>) {
    @fields1 = split " ", $line;
   }
while (my $line = <IN2>) {
    @fields2 = split " ", $line;
   }
 
   print "@fields1\n";
   print "@fields2\n";
   

close IN1; 
close IN2;

我得到的输出

8 42 64 x9878
9 43 65 x9879

所需的输出

Time    Src_id  Des_id  Address
0   34  56  x9870
2   36  58  x9872
4   38  60  x9874
6   40  62  x9876
8   42  64  x9878
9 43 65 x9879
Time    Src_id  Des_id  Address
1   35  57  x9871
3   37  59  x9873
5   39  61  x9875
7   41  63  x9877
9   43  65  x9879

如果我使用 push(@fields1 , split " ", $line);，我会得到这样的输出，

Time Src_id Des_id Address 0 34 56 x9870 B 36 58 x9872 D 38 60 x9874 F 40 62 x9876 H 42 64 x9878

它应该打印整个数组但只打印最后一行？同样在此之后，我需要按顺序比较日志和打印的“时间”部分，但不知道如何在 while 循环中同时运行两个数组？请以不带任何模块的标准方式提出建议，因为我需要在其他服务器上运行它。

【问题讨论】：

cat log1.log log2.log | sort -n 或许
这是一个单线器。 $ perl -e"print sort { $a <=> $b} grep /^\d/,<>" log1.log log2.log

标签： loops perl

【解决方案1】：

以下代码演示了如何读取和打印日志文件（OP没有说明他为什么将行分成字段）

use strict;
use warnings;
use feature 'say';

my $fname1  = 'log1.txt';
my $fname2  = 'log2.txt';
my $div     = "\t";

my $file1   = read_file($fname1);
my $file2   = read_file($fname2);

print_file($file1,$div);
print_file($file2,$div);

sub read_file {
    my $fname = shift;
    
    my @data;
    
    open my $fh, '<', $fname
        or die "Couldn't read $fname";
        
    while( <$fh> ) {
        chomp;
        next if /^#Log/;
        push @data, [split];
    }
        
    close $fh;
    
    return \@data;
}

sub print_file {
    my $data = shift;
    my $div  = shift;
    
    say join($div,@{$_}) for @{$data};
}

输出

Time    Src_id  Des_id  Address
0       34      56      x9870
2       36      58      x9872
4       38      60      x9874
6       40      62      x9876
8       42      64      x9878
Time    Src_id  Des_id  Address
1       35      57      x9871
3       37      59      x9873
5       39      61      x9875
7       41      63      x9877
9       43      65      x9879

假设 OP 想要将两个文件合并为一个，并在 Time 字段上排序行

以Time 字段为键将文件读入%data 散列
打印标题 (@fields)
打印按Time键排序的哈希值

use strict;
use warnings;
use feature 'say';

my(@fields,%data);

my $fname1  = 'log1.txt';
my $fname2  = 'log2.txt';

read_data($fname1);
read_data($fname2);

say join("\t",@fields);
say join("\t",@{$data{$_}}) for sort { $a <=> $b } keys %data;

sub read_data {
    my $fname = shift;
    
    open my $fh, '<', $fname
        or die "Couldn't open $fname";
        
    while( <$fh> ) {
        next if /^#Log/;
        if( /^Time/ ) {
            @fields = split;
        } else {
            my @line = split;
            $data{$line[0]} = \@line;
        }
    }
        
    close $fh;
}

输出

Time    Src_id  Des_id  Address
0       34      56      x9870
1       35      57      x9871
2       36      58      x9872
3       37      59      x9873
4       38      60      x9874
5       39      61      x9875
6       40      62      x9876
7       41      63      x9877
8       42      64      x9878
9       43      65      x9879

【讨论】：

第二个代码假定 Time 中的值是唯一的（因为哈希键是唯一的），并且它们永远不会高于 9（例如，因为字母排序会将 10 放在 9 之前） .如果其中任何一个不正确，则此代码已损坏。此外，在子例程中分配给全局变量是不好的做法。
@TLP - 您应该尝试在带有局部变量的子程序中使用 format 和 write - 如果没有全局变量，这将是一个很好的练习。即使在包中，一些变量也是全局的——C/C++ 有static 变量分类器用于此目的。也许您很幸运地了解了全局变量的用途以及它们提供的优势。
你可以只返回值。您对重用 sub 太感兴趣了，这是错误的设计选择。而且您仍在覆盖不唯一的哈希值。而且没有理由变得有毒和个人：解决问题而不是
@PolarBear @TLP，你能解释一下$data{$line[0]} = \@line; 和@{$data{$_}}) for sort { $a <=> $b } keys %data; 这些行吗？而且，如果两种情况的时间相同，则仅显示一种。谢谢。
@HG -- 在您的原始帖子中，您必须声明 Time 可以有重复项（否则假定不允许重复项）。 $data{$line[0]} = \@line -- %data 是一个散列，@line 是一个数组 -- 意味着将对数组@line 的引用放入散列%data 中，以获得存储在数组0 的第一个元素line 中的键。 sort { $a <=> $b} keys %data -- 对哈希 %data 的键进行数字排序。 @{$data{$_}} -- 取散列元素%data 存储在特殊变量$_ [在for 循环中获得的值] 并用作数组[存储在散列中的值%data 是对数组]。

【解决方案2】：

因为@fields* 在每个循环中都会被覆盖。你需要这个：

while(my $line = <IN1>){
    my @tmp = split(" ", $line);
    push(@fields1, \@tmp);
}
foreach $item (@fields1){
    print("@{$item}\n");
}

然后@fields1 包含指向splited 数组的引用。

最终的@fields1 看起来像：

@fields1 = (
  <ref> ----> ["0", "34", "56", "x9870"]
  <ref> ----> ["2", "36", "58", "x9872"]
  ...
)

print 将打印：

Time Src_id Des_id Address
0 34 56 x9870
2 36 58 x9872
4 38 60 x9874
6 40 62 x9876
8 42 64 x9878

我想如果你这样做会更好chomp($line)。

但我想简单地做push(@fields1, $line)。和split每个数组项在比较阶段。

为了比较 2 个文件的内容，我个人会使用 2 个 while 循环来读取 2 个数组，就像您所做的那样。然后在for 或foreach 中进行比较。

【讨论】：

如果我使用push 整个数据以单行形式出现，而不是输入文件中给出的数据。为了比较内容，我使用了两个 while 循环。第一个循环内的第二个循环，但这对我不起作用。你能详细说明while循环部分吗？
@HG 详细说明 while?
在空格上拆分时不需要chomp。
@TLP 好的。这只是我的习惯。
@Light，嘿，一切都在同一条线上。

【解决方案3】：

您可以使用粘贴合并日志文件，并一次读取一行生成的合并文件。这更优雅并节省了 RAM。这是一个可能比较time1 和time2 的示例，将STDOUT 和STDERR 写入单独的文件。如果time1 < time2 and time1 < 4，该示例将所有输入字段打印到STDOUT，否则将警告打印到STDERR：

cat > log1.log <<EOF
Time    Src_id  Des_id  Address
0   34  56  x9870
2   36  58  x9872
4   38  60  x9874
6   40  62  x9876
8   42  64  x9878
EOF


cat > log2.log <<EOF
Time    Src_id  Des_id  Address
1   35  57  x9871
3   37  59  x9873
5   39  61  x9875
7   41  63  x9877
9   43  65  x9879
EOF


# Paste files side by side, skip header, read data lines together, compare and print:

paste log1.log log2.log | \
    tail -n +2 | \
    perl -lane '
BEGIN {
    for $file_num (1, 2)  { push @col_names, map { "$_$file_num" } qw( time src_id des_id address ) }
}
my %val;
@val{ @col_names } = @F;
if ( $val{time1} < $val{time2} and $val{time1} < 4) {
    print join "\t", @val{ @col_names};
} else {
    warn "not found: @val{ qw( time1 time2 ) }";
}
' 1>out.tsv 2>out.log

输出：

% cat out.tsv
0       34      56      x9870   1       35      57      x9871
2       36      58      x9872   3       37      59      x9873
% cat out.log
not found: 4 5 at -e line 10, <> line 3.
not found: 6 7 at -e line 10, <> line 4.
not found: 8 9 at -e line 10, <> line 5.

Perl 单行代码使用这些命令行标志：
-e：告诉 Perl 查找内联代码，而不是在文件中。
-n：循环输入一行一次，默认将其分配给$_。
-l：在执行内联代码之前剥离输入行分隔符（默认为 *NIX 上的"\n"），并在打印时附加它。-a : 在空格或-F 选项中指定的正则表达式上将$_ 拆分为数组@F。

另请参阅：
perldoc perlrun: how to execute the Perl interpreter: command line switches

【讨论】：

在日志文件中更改分隔符真的有任何意义吗？或者展示您如何在答案中创建日志文件？
@TLP 感谢您建议保持日志文件分隔符不变。我更新了答案。实际上，通常没有必要更改日志文件，这里也不例外。关于您的第二点：我必须以某种方式创建日志文件，作为示例输入，以显示 minimal reproducible example - 我希望 OP 包含类似这样的内容（实际命令）。我展示的命令让其他想要回答问题的人更容易。他们可以一举将其复制并粘贴到外壳中。无需编辑器等