跳过数组中的一行，Perl答案

【问题标题】：skipping a line in an array, Perl跳过数组中的一行，Perl
【发布时间】：2012-06-17 17:57:23
【问题描述】：

我对 Perl 比较陌生，我遇到了这个项目，我遇到了一些困难。该项目的目标是比较两个 csv 文件，其中一个将包含： $name, $model, $版本另一个将包含： $name2,$disk,$storage 最后，RESULT 文件将包含匹配的行并将信息放在一起，如下所示： $name, $model, $version, $disk,$storage。

我已经设法做到这一点，但我的问题是，当缺少程序的元素之一中断时。当它遇到文件中缺少元素的行时，它会停在该行。我该如何解决这个问题？关于如何让它跳过该行并继续继续的任何建议或方法？

这是我的代码：

open( TESTING, '>testing.csv' ); # Names will be printed to this during testing. only .net       ending names should appear
open( MISSING, '>Missing.csv' ); # Lines with missing name feilds will appear here.

#open (FILE,'C:\Users\hp-laptop\Desktop\file.txt');
#my (@array) =<FILE>;
my @hostname;    #stores names

#close FILE;
#***** TESTING TO SEE IF ANY OF THE LISTED ITEMS BEGIN WITH A COMMA AND DO NOT HAVE A   NAME.
#***** THESE OBJECTS ARE PLACED INTO THE MISSING ARRAY AND THEN PRINTED OUT IN A SEPERATE
#***** FILE.
#open (FILE,'C:\Users\hp-laptop\Desktop\file.txt');
#test
if ( open( FILE, "file.txt" ) ) {

}
else {
  die " Cannot open file 1!\n:$!";

}

$count = 0;
$x     = 0;
while (<FILE>) {

  ( $name, $model, $version ) = split(",");    #parsing

  #print $name;
  chomp( $name, $model, $version );

  if ( ( $name =~ /^\s*$/ )
      && ( $model   =~ /^\s*$/ )
      && ( $version =~ /^\s*$/ ) )    #if all of the fields  are blank ( just a blank space)
  {

    #do nothing at all
  }
  elsif ( $name =~ /^\s*$/ ) {   #if name is a blank
    $name =~ s/^\s*/missing/g;
    print MISSING "$name,$model,$version\n";

    #$hostname[$count]=$name;
    #$count++;
  }
  elsif ( $model =~ /^\s*$/ ) {   #if model is blank
    $model =~ s/^\s*/missing/g;
    print MISSING"$name,$model,$version\n";
  }
  elsif ( $version =~ /^\s*$/ ) {   #if version is blank
    $version =~ s/^\s*/missing/g;
    print MISSING "$name,$model,$version\n";
  }

  # Searches for .net to appear in field "$name" if match, it places it into hostname array.
  if ( $name =~ /.net/ ) {

    $hostname[$count] = $name;
    $count++;
  }

#searches for a comma in the name feild, puts that into an array and prints the line into the missing file.
#probably won't have to use this, as I've found a better method to test all of the    feilds ( $name,$model,$version)
#and put those into the missing file. Hopefully it works.
#foreach $line (@array)
#{
#if($line =~ /^\,+/)
#{
#$line =~s/^\,*/missing,/g;
#$missing[$x]=$line;
#$x++;
#}
#}

}
close FILE;

for my $hostname (@hostname) {
  print TESTING $hostname . "\n";
}

#for my $missing(@missing)
#{
# print MISSING $missing;
#}
if ( open( FILE2, "file2.txt" ) ) {    #Run this if the open succeeds

  #open outfile and print starting header
  open( RESULT, '>resultfile.csv' );
  print RESULT ("name,Model,version,Disk, storage\n");
}
else {
  die " Cannot open file 2!\n:$!";
}
$count = 0;
while ( $hostname[$count] ne "" ) {
  while (<FILE>) {
    ( $name, $model, $version ) = split(",");    #parsing

    #print $name,"\n";

    if ( $name eq $hostname[$count] )    # I think this is the problem area.
    {
      print $name, "\n", $hostname[$count], "\n";

      #print RESULT"$name,$model,$version,";
      #open (FILE2,'C:\Users\hp-laptop\Desktop\file2.txt');
      #test
      if ( open( FILE2, "file2.txt" ) ) {

      }
      else {
        die " Cannot open file 2!\n:$!";

      }

      while (<FILE2>) {
        chomp;
        ( $name2, $Dcount, $vname ) = split(",");    #parsing

        if ( $name eq $name2 ) {
          chomp($version);
          print RESULT"$name,$model,$version,$Dcount,$vname\n";

        }

      }

    }

    $count++;
  }

  #open (FILE,'C:\Users\hp-laptop\Desktop\file.txt');
  #test
  if ( open( FILE, "file.txt" ) ) {

  }
  else {
    die " Cannot open file 1!\n:$!";

  }

}

close FILE;
close RESULT;
close FILE2;

【问题讨论】：

下次请在您的代码中使用 strict ，它可以保护您免受烦人的错误。
请use strict;，use warnings;，正确缩进你的代码，使用带有词法文件句柄的open参数版本，并学习如何使用数组函数（push,map,grep）。
无论您使用什么材料自学 Perl，我都强烈建议您放弃它们 - 您的代码基于的模板范围从非常过时（全局命名文件句柄，2-arg 形式的打开）大错特错。请不要将其视为个人 - 这显然不是你的错，但是通过学习比你明显使用的更好、更现代的书籍/教程/代码示例，你会得到很好的服务。
...使用这些模板的问题之一是您的代码比应有的更难阅读和理解。
哦，还有 +1，因为它有一个定义明确的问题的工作代码，而不是问“我该怎么做”:)

标签： arrays perl file csv compare

【解决方案1】：

我想你想要next，它可以让你立即完成当前迭代并开始下一个迭代：

while (<FILE>) {
  ( $name, $model, $version ) = split(",");
  next unless( $name && $model && $version );
  ...;
  }

您使用的条件取决于您接受的值。在我的示例中，我假设所有值都需要为真。如果它们不需要是空字符串，也许您可以检查长度：

while (<FILE>) {
  ( $name, $model, $version ) = split(",");
  next unless( length($name) && length($model) && length($version) );
  ...;
  }

如果您知道如何验证每个字段，您可能会有针对这些字段的子例程：

while (<FILE>) {
  ( $name, $model, $version ) = split(",");
  next unless( length($name) && is_valid_model($model) && length($version) );
  ...;
  }

sub is_valid_model { ... }

现在您只需要决定如何将其整合到您已经在做的事情中。

【讨论】：

非常感谢您的意见。我会尝试用这个来修改我的代码并发布发生了什么！

【解决方案2】：

您应该首先将use strict 和use warnings 添加到程序顶部，并在首次使用时使用my 声明所有变量。这将揭示许多原本难以发现的简单错误。

您还应该使用open 的三个参数和词法文件句柄，并且用于检查打开文件异常的Perl 习惯用法是将or die 添加到open 调用中。 if 成功路径带有空块的语句会浪费空间并变得不可读。 open 调用应如下所示

open my $fh, '>', 'myfile' or die "Unable to open file: $!";

最后，当您处理 CSV 文件时，使用 Perl 模块会更安全，因为使用简单的split /,/ 存在很多缺陷。 Text::CSV 模块已经为您完成了所有工作，并且可以在 CPAN 上使用。

您的问题是，在读取到第一个文件的末尾后，在第二个嵌套循环中再次从同一个句柄读取之前，您不会倒带或重新打开它。这意味着不会从该文件中读取更多数据，并且程序的行为就像它是空的一样。

为了配对对应的记录而对同一个文件进行数百次读取是一种糟糕的策略。如果文件大小合理，您应该在内存中构建一个数据结构来保存信息。 Perl 哈希是理想的选择，因为它允许您立即查找与给定名称对应的数据。

我已经编写了您的代码的修订版来演示这些要点。由于我没有示例数据，因此测试代码会很尴尬，但如果您仍然遇到问题，请告诉我们。

use strict;
use warnings;

use Text::CSV;

my $csv = Text::CSV->new;

my %data;

# Read the name, model and version from the first file. Write any records
# that don't have the full three fields to the "MISSING" file
#
open my $f1, '<', 'file.txt' or die qq(Cannot open file 1: $!);

open my $missing, '>', 'Missing.csv' 
    or die qq(Unable to open "MISSING" file for output: $!);
    # Lines with missing name fields will appear here.

while ( my $line = csv->getline($f1) ) {

  my $name = $line->[0];

  if (grep $_, @$line < 3) {
    $csv->print($missing, $line);
  }
  else {
    $data{$name} = $line if $name =~ /\.net$/i;
  }
}

close $missing;

# Put a list of .net names found into the testing file
#
open my $testing, '>', 'testing.csv'
    or die qq(Unable to open "TESTING" file for output: $!);
    # Names will be printed to this during testing. Only ".net" ending names should appear

print $testing "$_\n" for sort keys %data;

close $testing;

# Read the name, disk and storage from the second file and check that the line
# contains all three fields. Remove the name field from the start and append
# to the data record with the matching name if it exists.
#
open my $f2, '<', 'file2.txt' or die qq(Cannot open file 2: $!);

while ( my $line = $csv->getline($f2) ) {

  next unless grep $_, @$line >= 3;

  my $name = shift @$line;
  next unless $name =~ /\.net$/i;

  my $record = $data{$name};
  push @$record, @$line if $record;
}

# Print the completed hash. Send each record to the result output if it
# has the required five fields
#
open my $result, '>', 'resultfile.csv' or die qq(Cannot open results file: $!);

$csv->print($result, qw( name Model version Disk storage ));

for my $name (sort keys %data) {

  my $line = $data{$name};

  if (grep $_, @$line >= 5) {
    $csv->print($result, $data{$name});
  }
}

【讨论】：

非常感谢您！通过查看这段代码，我可以更好地理解我应该如何去做。唯一的问题是我不能使用 CPAN 模块。
如果您“不允许”，则表明这是家庭作业，而不仅仅是您“遇到”的问题。完全披露只是礼貌。
对不起，不。这不是家庭作业。我根本不允许修改我使用的计算机上的程序。无论如何感谢您的意见。