输出文件的 Perl（错误？）编码答案

【问题标题】：Perl (wrong?) encoding of output file输出文件的 Perl（错误？）编码
【发布时间】：2014-02-21 10:02:57
【问题描述】：

我在 Windows 7（32 位）上运行 Active Perl 5.16.3。

我的（短）程序处理输入文本文件（以 UTF-8 编码）。我希望输出编码是 Latin1，所以我的代码是：

open (OUT, '>;encoding(Latin1)', "out.txt") || die "Cannot open output file: $!\n";
print OUT "$string\n";

但生成的文件仍为 UTF-8。我做错了什么？

【问题讨论】：

开放模式字符串中真的有分号吗？它应该是一个冒号 - >:encoding(Latin1)

标签： perl utf-8 character-encoding latin1

【解决方案1】：

首先，编码层和开放模式之间用冒号隔开，而不是分号。

open OUT, '>:encoding(latin1)', "out.txt" or die "Cannot open output file: $!\n";

其次，Latin-1 只能编码 UTF-8 的一小部分。此外，该子集的大部分在两种编码中都以相同的方式编码。因此，我们必须使用具有不同编码字符的测试文件，例如\N{MULTIPLICATION SIGN} U+00D7 ×，在 Latin-1 中是 \xD7，在 UTF-8 中是 \xC3\x97。

还要确保您实际解码了输入文件。

以下是生成测试文件的方法：

$ perl -CSA -E'say "\N{U+00D7}"' > input.txt

您可以通过以下方式测试您是否正确地重新编码文件：

use strict;
use warnings;
use autodie;

open my $in, "<:encoding(UTF-8)", "input.txt";
open my $out, ">:encoding(latin1)", "output.txt";

while (<$in>) {
    print { $out } $_;
}

input.txt 和 output.txt 之后应该有不同的长度（3 个字节 → 2 个字节）。

【讨论】：