如何在 perl CGI 参数中使用 unicode答案

【问题标题】：How to use unicode in perl CGI param如何在 perl CGI 参数中使用 unicode
【发布时间】：2013-12-23 20:08:17
【问题描述】：

我有一个接受 unicode 字符作为参数之一的 Perl CGI 脚本。
网址格式为

.../worker.pl?text="some_unicode_chars"&...

在 perl 脚本中，我将 $text 变量传递给 shell 脚本：

system "a.sh \"$text\" out_put_file";

如果我在 perl 脚本中对文本进行硬编码，则效果很好。但是，当使用 CGI 从 Web 获取 $text 时，输出没有意义。

my $q = CGI->new;  
my $text = $q->param('text');

我怀疑是编码导致了问题。 uft-8给我带来了很多麻烦。有人请帮帮我吗？

【问题讨论】：

标签： perl unicode utf-8 cgi

【解决方案1】：

如果您在参数列表中传递 UTF-8 数据，那么您肯定希望使用 URI::Escape 模块对它们进行 URI 编码。这会将任何扩展字符转换为易于打印和阅读的百分比值。在接收端，您需要在继续之前对它们进行 URI 解码。

【讨论】：

这就是我们现在所做的。但这意味着需要另一层。我更喜欢直接在浏览器地址栏中输入 unicode 字符的方式。
这是另一层，但它是 RFC (tools.ietf.org/html/rfc3986#section-2) 中所述的正确层。 URI 编码不仅包括扩展字符，还包括保留字符。例如，空格和正斜杠。

【解决方案2】：

也许这会有所帮助。来自Perl Programming/Unicode UTF-8：

默认情况下，CGI.pm 不会解码您的表单参数。您可以使用 -utf8 pragma，它将所有参数视为（和解码） UTF-8 字符串，但如果您有任何二进制文件上传，这将失败字段。更好的解决方案涉及覆盖 param 方法： （示例如下）

[错误 - 请参阅更正]这是documentation for the utf-8 pragma。由于上传二进制数据对您来说似乎不是问题，因此使用utf-8 pragma 似乎是最直接的方法。

更正：根据@Slaven 的评论，不要将通用 Perl utf8 杂注与已定义用于 CGI.pm 的 -utf-8 pragma 混淆：

-utf8

这使得 CGI.pm 将所有参数视为 UTF-8 字符串。使用这个小心，因为它会干扰二进制上传的处理。它最好手动选择哪些字段应该返回 utf-8 字符串并使用如下代码进行转换：

use Encode;
my $arg = decode utf8=>param('foo');

跟进： duleshi，你问：但我还是不明白 Encode 中的 decode 和 utf8::decode 的区别。 Encode 和 utf8 模块有何不同？

来自utf8 pragma的文档：

请注意，此函数不处理任意编码。因此建议对一般用途进行编码；另见Encode。

换句话说，Encode 模块适用于许多不同的编码（包括 UTF-8），而 utf8 函数适用于仅使用 UTF-8 编码。

这是一个 Perl 程序，它演示了编码和解码 UTF-8 的两种方法的等效性。（另见live demo。）

#!/usr/bin/perl

use strict;
use warnings;
use utf8;  # allows 'ñ' to appear in the source code

use Encode;

my $word = "Español";  # the 'ñ' is permitted because of the 'use utf8' pragma

# Convert the string to its UTF-8 equivalent.
my $utf8_word = Encode::encode("UTF-8", $word);

# Use 'utf8::decode' to convert the string back to internal form.
my $word_again_via_utf8 = $utf8_word;
utf8::decode($word_again_via_utf8);  # converts in-place

# Use 'Encode::decode' to convert the string back to internal form.
my $word_again_via_Encode = Encode::decode("UTF-8", $utf8_word);

# Do the two conversion methods produce the same result?
# Prints 'Yes'.
print $word_again_via_utf8 eq $word_again_via_Encode ? "Yes\n" : "No\n";

# Do we get back the original internal string after converting both ways?
# Prints 'Yes'.
print $word eq $word_again_via_Encode ? "Yes\n" : "No\n";

【讨论】：

utf-8 pragma 文档说：“除了告诉 Perl 你的脚本是用 UTF-8 编写的，不要将此 pragma 用于其他任何事情。”。所以这里不需要。
@Slaven - 感谢您的指导...请参阅我的更正。
我认为真正的罪魁祸首可能是权限设置，这对于像我这样的 CGI 新手来说是许多错误的原因。现在是周末，所以我没有测试它。无论如何，你的回答很有帮助。谢谢！但是我还是不明白 Encode 中的 decode 和 utf8::decode 的区别。 Encode 和 utf8 模块有何不同？
duleshi - 请参阅我在跟进之后添加的附加说明。
它不起作用。我的 $text = 解码 utf8=>$q->param('text');打开 D, '>', 'dd.txt';打印 D $文本；关闭 D; dd.txt中的内容是乱码。