如何在 Perl 中解析多行记录？答案

【问题标题】：How can I parse multiline records in Perl?如何在 Perl 中解析多行记录？
【发布时间】：2019-03-08 12:56:36
【问题描述】：

我正在尝试解析使用分隔符 '#' 的字符串这个字符串有 3 行

101#Introduction to the Professor#SG_FEEL#QUE_NOIMAGE#
head up to the Great Hall and speak to the professor to check in for class.#
#

102#Looking for Instructors#SG_FEEL#QUE_NOIMAGE#
Look for the Battle Instructor.#
Talk to Battle Instructor#

103#Battle Instructor#SG_FEEL#QUE_NOIMAGE#
You have spoken to the Battle Instructor#
#

如何在分隔符 '#' 之前获取每个值，以便我可以制作一个看起来像这样的新格式

[101] = {
    Title = "Introduction to the Professor",
    Description = {
        "head up to the Great Hall and speak to the professor to check in for class."
    },
    Summary = ""
},  
[102] = {
    Title = "Looking for Instructors",
    Description = {
        "Look for the Battle Instructor."
    },
    Summary = "Talk to Battle Instructor"
},
[103] = {
    Title = "Battle Instructor",
    Description = {
        "You have spoken to the Battle Instructor"
    },
    Summary = ""
},

还有来自101-n的多个数据

我正在尝试将 split 与以下代码一起使用：

#!/usr/bin/perl

use strict;
use warnings;

my $data = '101#Introduction to the Professor#SG_FEEL#QUE_NOIMAGE#';

my @values = split('#', $data);

foreach my $val (@values) {
    print "$val\n";
}

exit 0;

和输出：

101
Introduction to the Professor
SG_FEEL
QUE_NOIMAGE

如何读取多行数据？还有如何排除一些数据，比如匹配新格式，我不需要SG_FEEL和QUE_NOIMAGE数据

【问题讨论】：

您已经拥有哪些 Perl 代码，它是如何失败的？你看过[拆分功能]（perldoc.perl.org/functions/split.html）吗？它怎么不做你想做的事？
这三个换行符真的在您的数据中吗？另外，您能否提供更多数据（2 或 3 条记录的序列，而不仅仅是 1 条）
欢迎来到 SO。请提供一个最小、完整和可验证的示例。 向我们展示您最近尝试的代码以及您遇到的问题。并解释为什么结果不是你所期望的。编辑您的问题以包含代码，请不要在评论中添加它，因为它可能不可读。 stackoverflow.com/help/mcve
@ikegami 我添加了你要的数据记录
更新问题字段

标签： perl parsing

【解决方案1】：

Perl 特殊变量$/ 设置“输入记录分隔符”——Perl 用来决定行结束位置的字符串。您可以将其设置为其他内容。

use v5.26;
use utf8;
use strict;
use warnings;

$/ = "\n\n";  # set the input record separator

while( <DATA> ) {
    chomp;
    say "$. ------\n", $_;
    }

__END__
101#Introduction to the Professor#SG_FEEL#QUE_NOIMAGE#
head up to the Great Hall and speak to the professor to check in for class.#
#

102#Looking for Instructors#SG_FEEL#QUE_NOIMAGE#
Look for the Battle Instructor.#
Talk to Battle Instructor#

103#Battle Instructor#SG_FEEL#QUE_NOIMAGE#
You have spoken to the Battle Instructor#
#

输出显示您在每次调用<DATA> 时都读取了整个记录：

1 ------
101#Introduction to the Professor#SG_FEEL#QUE_NOIMAGE#
head up to the Great Hall and speak to the professor to check in for class.#
#
2 ------
102#Looking for Instructors#SG_FEEL#QUE_NOIMAGE#
Look for the Battle Instructor.#
Talk to Battle Instructor#
3 ------
103#Battle Instructor#SG_FEEL#QUE_NOIMAGE#
You have spoken to the Battle Instructor#
#

从那里您可以根据需要解析该记录。

【讨论】：

【解决方案2】：

阅读多行很容易，见readline：

open my $fh, '<', $filename
    or die "Couldn't read '$filename': $!";
my @input = <$fh>;

现在您想遍历所有行并查看如何处理它们：

my $linenumber;
my %info; # We want to collect information
while ($linenumber < $#input) {

以nnn# 开头的每一行都开始一个新项目：

    if( $input[ $linenumber ] =~ /^(\d+)#/ ) {
        my @data = split /#/, $input[ $linenumber ];
        $info{ number } = $data[0];
        $info{ Title } = $data[1];
        $linenumber++;
    };

现在，将内容读入描述，直到我们遇到一个空行：

    while ($input[$linenumber] !~ /^#$/) {
        $info{ Description } .= $input[$linenumber];
        $linenumber++;
    };

    $linenumber++; # skip the last "#" line

现在，将%info 中的内容输出，格式留作练习。我使用qq{} 进行演示。您需要将其更改为 qq():

    print qq{Number: "$info{ number }"\n};
    print qq{Title: "$info{ Title }"\n};
    print qq(Description: {"$info{ Description }"}\n);
};

【讨论】：

谢谢你解释得这么详细，一定会试试的