perl 将文件读入字符串，然后检查每个字符的 unicode 范围答案

【问题标题】：perl read a file into string then check each character for unicode rangeperl 将文件读入字符串，然后检查每个字符的 unicode 范围
【发布时间】：2014-01-10 09:14:35
【问题描述】：

我正在尝试将文件读入字符串，然后检查 unicode 范围 2816-2943 的每个字符。除了范围和 \n 之外的所有其他字符都需要跳过。我从网上得到了以下代码，但不适合我。如果我犯了愚蠢的错误，我很抱歉，我是 perl 的新手。请帮助我今天只需要完成这个。

use utf8;
use encoding 'utf8';
use open qw/:std :utf8/;

binmode(STDOUT, ":utf8"); #makes STDOUT output in UTF-8 instead of ordinary ASCII.


$file="content.txt";
open FILE1, ">filtered.txt" or die $!;
    open(FILE, "<$file") or die "Can't read file 'filename' [$!]\n";  
    binmode(FILE);
    my $document = <FILE>; 
    close (FILE);  
    print $document;

【问题讨论】：

您的代码不工作的原因是它只是将一个文件的内容复制到另一个文件，没有任何过滤。

标签： perl unicode file-handling string-parsing

【解决方案1】：

下面逐行读取$input文件并将过滤后的行写入$output文件。

my $input  = 'content.txt';
my $output = 'filtered.txt';

open(my $src_fh, '<:encoding(UTF-8)', $input)
  or die qq/Could not open file '$input' for reading: '$!'/;

open(my $dst_fh, '>:encoding(UTF-8)', $output)
  or die qq/Could not open file '$output' for writing: '$!'/;

while(<$src_fh>) {
    s/[^\x{0B00}-\x{0B7F}\n]//g;
    print {$dst_fh} $_
      or die qq/Could not write to file '$output': '$!'/;
}

close $dst_fh
  or die qq/Could not close output filehandle: '$!'/;

close $src_fh
  or die qq/Could not close input filehandle: '$!'/;

【讨论】：