Perl——未初始化的变量答案

【问题标题】：Perl -- uninitialized variablePerl——未初始化的变量
【发布时间】：2013-08-25 01:08:47
【问题描述】：

发生了什么事？我创建了一个简单的程序来读取行并在文件上打印输出。但它会引发一些错误......

这里的代码和它的解释在 cmets 上：

use warnings;
use List::MoreUtils qw(indexes);

my @array_words = ();
my @array_split = ();
my @array_of_zeros = (0);
my $index = 0;

open my $info, 'models/busquedas.csv';
open my $model, '>>models/model.txt';

#First while is to count the words and store it into an array
while( my $line = <$info>)  {
    @array_split = regex($line);
    for (my $i=0; $i < scalar(@array_split); $i++) {
            # Get the index if the word is repeated
        $index = indexes { $_ eq $array_split[$i] } $array_words[$i];
            # if the word is not repeated then save it to the array by 
            # checking the index
        if ($index != -1){ push(@array_words, $array_split[$i]); }
    }
}

print $model @array_words;

sub regex{
    # get only basic info like: 'texto judicial madrid' instead of the full url
    if ($_[0] =~ m/textolibre=/ and 
        $. < 3521239 && 
        $_[0] =~ m/textolibre=(.*?)&translated/) {
        return split(/\+/, $_[0]);
    }
}

而我不明白的错误是：

Use of uninitialized value $index in numeric ne (!=) at classifier.pl line 21, <$info> line 12216.
Use of uninitialized value $index in numeric ne (!=) at classifier.pl line 21, <$info> line 12216.
Use of uninitialized value $index in numeric ne (!=) at classifier.pl line 21, <$info> line 12216.
Use of uninitialized value $index in numeric ne (!=) at classifier.pl line 21, <$info> line 12217.
Use of uninitialized value $index in numeric ne (!=) at classifier.pl line 21, <$info> line 12217.
Use of uninitialized value $index in numeric ne (!=) at classifier.pl line 21, <$info> line 12217.
Use of uninitialized value $index in numeric ne (!=) at classifier.pl line 21, <$info> line 12217.
Use of uninitialized value $index in numeric ne (!=) at classifier.pl line 21, <$info> line 12217.
Use of uninitialized value $index in numeric ne (!=) at classifier.pl line 21, <$info> line 12218.
Use of uninitialized value $index in numeric ne (!=) at classifier.pl line 21, <$info> line 12218.

为什么未初始化 $index ？我已经声明它并用 0 值初始化它！我该如何解决这个问题？

【问题讨论】：

我认为您误解了indexes 函数的工作原理。它应该有一个可以迭代的列表，而不是单个元素。它返回索引，并且您已经拥有该项目的索引：$i。
为什么在@array_words 的单个元素而不是整个数组上调用indexes？
但是如何才能在其他语言上拥有类似 .indexof() 的功能？ @nwellnhof
我想你想使用$index = first_index { $_ eq $array_split[$i] } @array_words;。顺便说一句，如果您使用更大的数据集，使用哈希检查重复项应该会快得多。

标签： perl file variables initialization

【解决方案1】：

你已经用零初始化了变量，但是你改变了它的值

$index = indexes { $_ eq $array_split[$i] } $array_words[$i];

该函数可能返回一个 undef（因为 $array_words[$i] 不等于 $array_split[$i]）。否则它会返回一个，因为列表中只有一个元素。

顺便说一句，如果您不需要循环外的值，则在循环外初始化变量是一种不好的做法。您可以在使用indexes 填充它的同一行声明my $index。

【讨论】：

indexes 返回块评估为真的列表索引。如果在这样的标量上下文中（不正确地）使用它，结果将是 last 这样的索引，如果列表中没有元素满足标准，则结果将是 undef。它将永远 “返回一个”，除非列表的 second 元素是最后一个通过测试的元素。
@Borodin：哦，真的吗？ perl -MList::MoreUtils=indexes -E '$x = indexes {$_ lt "c"} qw/c b d e a/;say $x' 返回 2，但返回的索引是 1 和 4。
这在我的系统上给了我4。你的List::MoreUtils 是最新的吗？运行perl -MList::MoreUtils -E 'say $List::MoreUtils::VERSION'最新版本为0.33。
@Borodin：版本是 0.30。升级后，我得到 4。但文档声明“这就像 grep”，但 grep 返回标量上下文中的匹配数。
是的，我也读过。似乎是一个倒退的步骤，但它就是这样做的。这是返回 list 和 array 之间的区别。 grep 是后者。

【解决方案2】：

正如已经观察到的，indexes 子例程不是这样工作的。它返回一个索引的list，块的计算结果为true。在这样的标量上下文中使用它是错误的。

如果您要为此使用库，则需要 any - 也来自 List::MoreUtils。代码如下所示

while( my $line = <$info>)  {
    @array_split = regex($line);
    for my $word (@array_split) {
      push @array_words, $word unless any { $_ eq $word } @array_words;
    }
}

但是我认为你想要一些更简单的东西。根据我对您的代码的理解，Perl 哈希将满足您的需求。

我已经像这样重构了你的程序。希望对你有帮助。

基本上，如果该行中的每个“单词”尚未在散列中，则将其推送到 @array_words。

您的regex 子例程中似乎也存在错误。声明

return split(/\+/, $_[0]);

分割整行并返回结果。我认为它应该只拆分您刚刚提取的 URL 的查询部分，像这样

return split /\+/, $1;

通常您应该检查open 调用是否成功。添加 autodie 杂注会为您隐式执行此操作。

use strict;
use warnings;
use autodie;

open my $info,  '<',  'models/busquedas.csv';
open my $model, '>>', 'models/model.txt';

my %unique_words;
my @array_words;

#First while is to count the words and store it into an array
while( my $line = <$info>)  {
  for my $word (regex($line)) {
    push @array_words, $word unless $unique_words{$word}++;
  }
}

print $model "$_\n" for @array_words;

sub regex {

  my ($line) = @_;

  # get only basic info like: 'texto judicial madrid' instead of the full url
  return unless $line =~ /textolibre=/ and $. < 3521239;
  if ( $line =~ /textolibre=(.*?)&translated/ ) {
    return split /\+/, $1;
  }
}

【讨论】：