【问题标题】：Perl CGI produces unexpected outputPerl CGI 产生意外的输出
【发布时间】：2017-01-08 06:22:56
【问题描述】：

我有一个用于在线索引应用程序的 Perl CGI 脚本，它在文本中搜索单词实例并打印排序后的输出。

#!/usr/bin/perl -wT

# middle.pl - a simple concordance



# require
use strict;
use diagnostics;
use CGI;


# ensure all fatals go to browser during debugging and set-up
# comment this BEGIN block out on production code for security
BEGIN {
    $|=1;
    print "Content-type: text/html\n\n";
    use CGI::Carp('fatalsToBrowser');
}

# sanity check
my $q = new CGI;
my $target = $q->param("keyword");
my $radius = $q->param("span");
my $ordinal = $q->param("ord");
my $width = 2*$radius;
my $file    = 'concordanceText.txt';
if ( ! $file or ! $target ) {

    print "Usage: $0 <file> <target>\n";
    exit;
    
}

# initialize
my $count   = 0;
my @lines   = ();
$/          = ""; # Paragraph read mode

# open the file, and process each line in it
open(FILE, " < $file") or die("Can not open $file ($!).\n");
while(<FILE>){

    # re-initialize
    my $extract = '';
    
    # normalize the data
    chomp;
    s/\n/ /g;        # Replace new lines with spaces
    s/\b--\b/ -- /g; # Add spaces around dashes

    # process each item if the target is found
    while ( $_ =~ /\b$target\b/gi ){
                
        # find start position
        my $match = $1;
        my $pos   = pos;
        my $start = $pos - $radius - length($match);

        # extract the snippets
        if ($start < 0){
            $extract = substr($_, 0, $width+$start+length($match));
            $extract = (" " x -$start) . $extract;
        }else{
            $extract = substr($_, $start, $width+length($match));
            my $deficit = $width+length($match) - length($extract);
            if ($deficit > 0) {
                $extract .= (" " x $deficit);
            }
    
        }

        # add the extracted text to the list of lines, and increment
        $lines[$count] = $extract;
        ++$count;
        
    }
    
}

sub removePunctuation {
    my $string = $_[0];
    $string = lc($string); # Convert to lowercase
    $string =~ s/[^-a-z ]//g; # Remove non-aplhabetic characters 
    $string =~ s/--+/ /g; #Remove 2+ hyphens with a space 
    $string =~s/-//g; # Remove hyphens
    $string =~ s/\s=/ /g;
    return($string);
    
}

sub onLeft {
    #USAGE: $word = onLeft($string, $radius, $ordinal);
    my $left = substr($_[0], 0, $_[1]);
    $left = removePunctuation($left);
    my @word = split(/\s+/, $left);
    return($word[-$_[2]]);
}

sub byLeftWords {
    my $left_a = onLeft($a, $radius, $ordinal);
    my $left_b = onLeft($b, $radius, $ordinal);
    lc($left_a) cmp lc($left_b);
}


# process each line in the list of lines

print "Content-type: text/plain\n\n";
my $line_number = 0;
foreach my $x (sort byLeftWords @lines){
    ++$line_number;
    printf "%5d",$line_number;
    print " $x\n\n";
}

# done
exit;

perl 脚本在终端（命令行）中产生预期的结果。但是在线应用程序的 CGI 脚本会产生意外的输出。我无法弄清楚我在 CGI 脚本中犯了什么错误。理想情况下，CGI 脚本应该产生与命令行脚本相同的输出。任何建议都会非常有帮助。

命令行输出

CGI 输出

【问题讨论】：

请勿发布用于输入或输出等文本数据的图像。始终将数据复制/粘贴到您的帖子中并格式化为代码。
请看What should I do when someone answers my question?

标签： perl cgi

【解决方案1】：

BEGIN 块在其他任何东西之前执行，因此在

之前

my $q = new CGI;

输出到服务器进程'stdout，而不是 HTTP 流，所以默认是 text/plain，正如您在 CGI 输出中看到的那样。

解决该问题后，您会发现输出看起来仍然像一个大块丑陋的块，因为您需要格式化并发送一个有效的 HTML 页面，而不仅仅是一大块文本。您不能只是将一堆文本转储到浏览器并期望它用它做任何智能的事情。您必须使用标签创建一个完整的 HTML 页面来布局您的内容，可能也使用 CSS。

换句话说，所需的输出将与仅写入终端时的输出完全不同。如何构建它取决于您，解释如何做到这一点超出了 StackOverflow 的范围。

【讨论】：

其实他们可以用text/plain很好的在浏览器中转储一堆文本。它看起来像一堆文本，这似乎正是该程序应该做的。
感谢您的建议。

【解决方案2】：

正如其他答案所述，BEGIN 块在程序的最开始执行。

BEGIN {
    $|=1;
    print "Content-type: text/html\n\n";
    use CGI::Carp('fatalsToBrowser');
}

在那里，您输出一个 HTTP 标头 Content-type: text/html\n\n。浏览器首先看到它，并将您的所有输出视为 HTML。但你只有文字。 HTML 页面中的空白被折叠成单个空格，因此您的所有\n 换行符都消失了。

稍后，您打印另一个标题，浏览器无法再将其视为标题，因为您已经有了一个并用两个换行符 \n\n 完成了它。现在切换回text/plain 为时已晚。

让 CGI 程序返回 text/plain 并在浏览器中显示没有标记的文本是非常好的，而您想要的只是文本，没有颜色、链接或表格。对于某些用例，这很有意义，即使它在超文本中不再具有 hyper。但你并没有真正这样做。

你的BEGIN 块是有目的的，但你做得过火了。您试图确保发生错误时，它会在浏览器中很好地打印出来，因此您在开发时不需要处理服务器日志。

CGI::Carp 模块和它的functionality fatalsToBrowser 带来了他们自己的机制。你不必自己做。

您可以安全地删除 BEGIN 块，只需将您的 use CGI::CARP 与所有其他 use 语句一起放在脚本顶部。无论如何，它们都会首先运行，因为use 在编译时运行，而您的其余代码在运行时运行。

如果您愿意，您可以保留$|++，它会关闭您的STDOUT 句柄的缓冲。它会立即刷新，每次打印时，该输出都会直接发送到浏览器而不是收集，直到它足够或有换行符。如果您的流程运行了很长时间，这会使用户更容易看到正在发生的事情，这在生产中也很有用。

您的程序的顶部现在应该是这样的。

#!/usr/bin/perl -T

# middle.pl - a simple concordance
use strict;
use warnigns;
use diagnostics;
use CGI;
use CGI::Carp('fatalsToBrowser');

$|=1;

my $q = CGI->new;

最后，简单介绍一下我从那里删除的其他部分。

您对use 语句的评论需要具有误导性。这些是use，而不是require。正如我上面所说，use 在编译时运行。另一方面，require 在 run time 运行并且可以有条件地完成。误导性的 cmets 会让其他人（或您）以后更难维护您的代码。
我从 shebang (#!/usr/bin/perl) 中删除了 -w 标志，并将 use warnings pragma 放入其中。这是一种更现代的打开警告的方式，因为有时可以忽略 shebang。
The use diagnostics pragma 在出现问题时为您提供超长的解释。这很有用，but also extra slow。您可以在开发期间使用它，但请在生产环境中删除它。
注释 sanity check 应该在 CGI 实例下向下移动。
请使用new 的调用形式来实例化CGI 和任何其他类。 -> 语法将正确处理继承，而旧的 new CGI 无法做到这一点。

【讨论】：

非常感谢您的建议。 CGI 现在产生预期的输出......！感谢您的帮助。
@DeepShah 请参阅the help section 了解当您的问题得到解答后该怎么做。

【解决方案3】：

我运行了你的 cgi。无论您在此处打印内容类型标头，BEGIN 块都会运行 - 您已在此处明确要求 HTML。然后稍后您尝试为 PLAIN 打印另一个标题。这就是为什么您可以在浏览器窗口的文本开头看到标题文本（尚未生效）的原因。

【讨论】：

感谢您的建议。