在 Perl 中获取 nbest 键值对哈希表答案

【问题标题】：get nbest key-value pairs hash table in Perl在 Perl 中获取 nbest 键值对哈希表
【发布时间】：2015-01-10 23:30:47
【问题描述】：

我有这个使用哈希表的脚本：

#!/usr/bin/env perl

use strict; use warnings;

my $hash = {
      'cat' => {
               "félin" => '0.500000',
               'chat' => '0.600000',
               'chatterie' => '0.300000'
               'chien' => '0.01000'
             },
      'rabbit' => {
                  'lapin' => '0.600000'                     
                },
      'canteen' => {
                   "ménagère" => '0.400000',
                   'cantine' => '0.600000'
                 }
       };

my $text = "I love my cat and my rabbit canteen !\n";

foreach my $word (split "\s+", $text) {
    print $word;
    exists $hash->{$word}
        and print "[" . join(";", keys %{ $hash->{$word} }) . "]";
    print " ";
}

现在，我有这个输出：

I love my cat[chat;félin;chatterie;chien] and my rabbit[lapin] canteen[cantine;ménagère] !

我需要根据频率（存储在我的哈希中）获得 nbest 键值。例如，我想根据频率获得 3 个最好的翻译，如下所示：

I love my cat[chat;félin;chatterie] and my rabbit[lapin] canteen[cantine;ménagère] !

如何更改我的代码以考虑每个值的频率并打印 nbest 值？

感谢您的帮助。

【问题讨论】：

按数字排序键？
自然是根据hash的信息。例如对于第一个条目：1) chat 2) félin 3) chatterie 4) chien。然后，在我的示例中，我想获得 3best 值。

标签： perl sorting hash hashtable perl-data-structures

【解决方案1】：

最简洁的方法是编写一个子程序，返回给定单词的 N 个最常见的翻译。我在下面的程序中写了best_n 来做到这一点。它使用来自List::UtilsBy 的rev_nsort_by 简洁地进行排序。它不是核心模块，因此很可能需要安装。

我还使用了可执行替换来就地修改字符串。

use utf8;
use strict;
use warnings;

use List::UtilsBy qw/ rev_nsort_by /;

my $hash = {
  'cat'     => {
    'félin'     => '0.500000',
    'chat'      => '0.600000',
    'chatterie' => '0.300000',
    'chien'     => '0.01000',
  },
  'rabbit'  => {
    'lapin'     => '0.600000',
  },
  'canteen' => {
    'ménagère'  => '0.400000',
    'cantine'   => '0.600000',
  }
};

my $text = "I love my cat and my rabbit canteen !\n";

$text =~ s{(\S+)}{
   $hash->{$1} ? sprintf '[%s]', join(';', best_n($1, 3)) : $1;
}ge;

print $text;

sub best_n {
  my ($word, $n) = @_;
  my $item = $hash->{$word};
  my @xlate = rev_nsort_by { $item->{$_} } keys %$item;
  $n = $n > @xlate ? $#xlate : $n - 1;
  @xlate[0..$n];
}

输出

I love my [chat;félin;chatterie] and my [lapin] [cantine;ménagère] !

【讨论】：

在本例中，输出为[chien;chatterie;félin]，但根据频率的最佳值是[chat;félin;chatterie]。在子例程中可能需要更改以解决该问题？
@chester：一个愚蠢的错误：我按照频率递增的顺序对单词进行了排序，并取了前三个，因此它们是最不频繁的 .同一个模块提供了一个rev_nsort_by，它按递减顺序排序，并解决了问题。