【问题标题】:UTF-8 Coding FailureUTF-8 编码失败
【发布时间】:2015-07-16 05:15:20
【问题描述】:

根据this post 的建议:

我用过:

use utf8;
use open ':encoding(utf8)';
binmode(STDOUT, ":utf8");
use open IN => ":encoding(utf8)", OUT => ':utf8';
use Encode;

当我搜索我的法语时它会起作用

http://french.godsplanforlife.org/cgi_use/search.html 页面但在我的罗马尼亚语页面上失败。 http://romanian.godsplanforlife.org/cgi_use/search.html 当我进行搜索时,特殊的罗马尼亚语字符会从正确切换为错误。

这里是 search.pl 的 Perl 代码,它在搜索页面的底部进行搜索和搜索结果的打印:

#!/usr/bin/perl
#search.pl
use utf8;
use open ':encoding(utf8)';

binmode(STDOUT, ":utf8");
use open IN => ":encoding(utf8)", OUT => ':utf8';

use Encode;

# The next three lines import special modules.
use CGI;
use CGI::Carp qw(fatalsToBrowser);
use File::Find;

$cgi=new CGI();

print $cgi->header();

$search_term = $cgi->param('search_term');
$page        = $cgi->param('page');
#Make the search term utf8 encoded.  
$search_term = decode_utf8( $search_term );

#The root directory is defined by the web hosting company.
# In this case it is Bluehost using Linux servers.
$root_dir = "/home2/godspla1/public_html/romanian";

$root_dir =~ s|/$||; #get rid of trailing slash

$html_lines= "";

#Specify directories to avoid searching.
$excluded = "cgi-bin|cgi_use|derived|images|_notes|_overlay|vti|_vti_cnf";

#Walk the directory tree;
#open the file and look for the term.
#See http://perldoc.perl.org/File/Find.html for the "find" function.
#\&search refers to the subroutine search() that will do the searching.
find( \&search, $root_dir ) if $search_term;

$html_lines ||= "<tr><td>No results found</td></tr>";

$search_results = qq{<table border="0" width="100%" align="center">}
                   .$html_lines.qq{</table>}; 

#Open the requested page to put in the results.
open (RESULTS, "$root_dir/$page") 
or die "Can't open results page ($root_dir/$page): $!";

#Substitute the search results and replace the search term too.
# see http://www.gossland.com/perlcourse/intro/flow for while loops.
while ( <RESULTS> ) {

#Move the point of printing insertion down to the results area.     
    s{<!-- search_results -->}{$search_results};

    s{name="search_term"\s*?value=""}
     {name="search_term" value="$search_term"};
     print;
}
close RESULTS;

#--This subroutine uses the find command on line 28 to find the search term.
sub search() {

    $seen = 0;

    $URL = $File::Find::name;
# !~ means not equal
# -f means the file is a normal file
#Exclude the exluded directories from the search. Files must be html.
    if ( $URL !~ m/$excluded|sidebar|footer|vti/ and -f and /.html?/ ) {

        $file = $_;
        open FILE, $file;
        @lines = <FILE>;
        close FILE;

#Grab the title, and the file name.  
#Each element ($_) of the @lines array is one paragraph from file.
        for ( @lines ) {

            $title = $1 if m|<title>(.*?)</title>|;
#The Q and the E are delimiters to escape interpretation.
#Increment $seen by one, which makes it true, if the match is seen.
            $seen++ if /\Q$search_term\E/i;
            $seen-- if m/\Q$search_term<\/a>\E/i;
        }

        if ( $seen ) {
            $URL =~ s|$root_dir||; 

#Format the found results into URL, title.
            $html_lines .= qq{<tr><td><a href="$URL">$URL</a>};
            $html_lines .= qq{</td><td>$title</td></tr>\n};
        }
    }
}

【问题讨论】:

标签: perl utf-8


【解决方案1】:

要从浏览器的 HTTP_POST 正确读取 UTF8 数据,您可以使用 use CGI; 并稍后解码:

use CGI;
binmode STDIN;
use Encode;
$search_term = $cgi->param('search_term');
$search_term = decode_utf8( $search_term );

use CGI qw ( -utf8 );

use CGI qw ( -utf8 );
binmode STDIN;
$search_term = $cgi->param('search_term');

要正确读取、修改和打印(到STDOUT)CGI 脚本用来生成输出的UTF8 编码模板文件,您应该在文件读取和输出到STDOUT 时启用UTF8 编码:

use open IN => ":encoding(utf8)";
binmode STDOUT, ":utf8";

最后需要告诉浏览器接收到的数据包含UTF8:

$cgi->header(-type => 'text/html', -charset => 'utf-8');

从你的脚本看,问题似乎主要与最后一点有关..(你缺少-charset =&gt; 'utf-8'

【讨论】:

  • 是的!修改后的行$cgi-&gt;header(-type =&gt; 'text/html', -charset =&gt; 'utf-8'); 解决了这个问题,现在罗马尼亚语字符在执行搜索之前和之后都能完美显示。谢谢!!
猜你喜欢
  • 1970-01-01
  • 2015-01-29
  • 2013-12-16
  • 1970-01-01
  • 2011-10-30
  • 2018-10-28
  • 2016-11-01
  • 2012-11-07
  • 2017-06-18
相关资源
最近更新 更多