【问题标题】:Cannot access all XML elements PERL无法访问所有 XML 元素 PERL
【发布时间】:2014-11-27 09:00:31
【问题描述】:

我有一个简单的 Perl 脚本,它可以解析从服务器下载的 XML 文件。唯一的问题是我做的数组似乎只存储了第一组标签。

这是我的代码:

sub test_button_func() {

$ua->cookie_jar($cookie_jar);
# Now make your request

my $server_endpoint = "server_url_link";
$server_endpoint =~ s/\%id\%/$user/g;

# set custom HTTP request header fields
my $req = HTTP::Request->new(GET => $server_endpoint);
$req->header('content-type' => 'application/json');

my $parser = shift;
my $resp = $ua->request($req);

if ($resp->is_success) {

    # XML response received from Jazz API.
    my $message = $resp->decoded_content;

    # Parse the XML downloaded here:
    # create object
    my $xml = new XML::Simple;

    # Read XML file
    my @data = $xml->XMLin($message);
    my $arraySize = scalar (@data);

    # Print out the data we need
    for (my $num = 0; $num <= ($arraySize - 1); $num++) {
        print @data[0]->{'oslc_disc:entry'}[$num]{'oslc_disc:ServiceProvider'}{'dc:title'};
        print "\n";
    }
}

else {
    print "HTTP GET error code: ", $resp->code, "\n";
    print "HTTP GET error message: ", $resp->message, "\n";
    print $resp->decoded_content;
}
}

我尝试解析的 XML 是这样的:

<oslc_disc:entry>

<oslc_disc:ServiceProvider>

    <dc:title>URL MAIN TITLE</dc:title>
    <oslc_disc:details rdf:resource="URL LINK"/>
    <oslc_disc:services rdf:resource="URL LINK"/>
    <jfs_proc:consumerRegistry rdf:resource="URL LINK"/>

</oslc_disc:ServiceProvider>

</oslc_disc:entry>

所以我的意思是我可以解析第一个“oslc_disc”条目。但是 XML 文件中的内容不止一个。如何解析其余部分?

【问题讨论】:

标签: arrays xml perl parsing


【解决方案1】:

不要使用 XML 库来解析 RDF。它会给你带来痛苦的世界。

另外,永远不要将 XML::Simple 用于任何事情。

RDF::Trine 是 Perl 中 RDF 的最新技术。 Attean 正在编写以替换它,重量更轻,应该提供更好的 API,但还没有真正准备好。

使用 RDF::Trine 的示例

use strict;
use warnings;
use RDF::Trine;

# $base is for resolving any relative URLs.
my $base = 'http://example.com/';

# Parse the data into $model
my $parser = RDF::Trine::Parser::RDFXML->new;
my $model  = RDF::Trine::Model->new;
$parser->parse_file_into_model($base, \*DATA, $model);

# Some namespaces for querying the data...
my $oslc = RDF::Trine::Namespace->new('http://open-services.net/xmlns/discovery/1.0/');
my $dc   = RDF::Trine::Namespace->new('http://purl.org/dc/terms/');

# Cycle through the objects of oscl_disc:entry
for my $provider ( $model->objects(undef, $oslc->entry) ) {

    # Get the title. This returns a list of titles, so we call
    # it in list context and get just the first result.
    my ($title) = $model->objects($provider, $dc->title);

    # Print it.
    print "GOT: ", $title->literal_value, "\n";
}

__DATA__
<oslc_disc:ServiceProviderCatalog
    xmlns:oslc_disc="http://open-services.net/xmlns/discovery/1.0/"
    xmlns:dc="http://purl.org/dc/terms/"
    xmlns:jfs_proc="https://jazz.net/xmlns/prod/jazz/process/1.0/"
    xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
    rdf:about="https://localhost:9443/ccm/oslc-scm/catalog.xml"
>
    <oslc_disc:entry>
        <oslc_disc:ServiceProvider>
            <dc:title>Foo Bar Baz</dc:title>
            <oslc_disc:details rdf:resource="foo"/>
            <oslc_disc:services rdf:resource="bar"/>
            <jfs_proc:consumerRegistry rdf:resource="baz"/>
        </oslc_disc:ServiceProvider>
    </oslc_disc:entry>
    <oslc_disc:entry>
        <oslc_disc:ServiceProvider dc:title="Foo Bar Baz II">
            <oslc_disc:details rdf:resource="foo2"/>
            <oslc_disc:services rdf:resource="bar2"/>
            <jfs_proc:consumerRegistry rdf:resource="baz2"/>
        </oslc_disc:ServiceProvider>
    </oslc_disc:entry>
</oslc_disc:ServiceProviderCatalog>

使用 Attean 的示例

我还没有用过很多 Attean,所以可能有更优雅的方法来做到这一点......

use strict;
use warnings;
use Attean::RDF qw(iri);

# $base is for resolving any relative URLs.
my $base = 'http://example.com/';

# Parse the data into $model
my $store  = Attean->get_store('Memory')->new;
my $model  = Attean::MutableQuadModel->new( store => $store );
my $data   = do { local $/; <DATA> };  # slurp filehandle into string
$model->load_triples(RDFXML => iri($base), $data);

# Some namespaces for querying the data...
my $oslc = 'http://open-services.net/xmlns/discovery/1.0/';
my $dc   = 'http://purl.org/dc/terms/';

# Cycle through the objects of oscl_disc:entry
for my $provider ( $model->objects(undef, iri("${oslc}entry"))->elements ) {

    # Get the title. This returns a list of titles, so we call
    # it in list context and get just the first result.
    my ($title) = $model->objects($provider, iri("${dc}title"))->elements;

    # Print it.
    print "GOT: ", $title->value, "\n";
}

__DATA__
<oslc_disc:ServiceProviderCatalog
    xmlns:oslc_disc="http://open-services.net/xmlns/discovery/1.0/"
    xmlns:dc="http://purl.org/dc/terms/"
    xmlns:jfs_proc="https://jazz.net/xmlns/prod/jazz/process/1.0/"
    xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
    rdf:about="https://localhost:9443/ccm/oslc-scm/catalog.xml"
>
    <oslc_disc:entry>
        <oslc_disc:ServiceProvider>
            <dc:title>Foo Bar Baz</dc:title>
            <oslc_disc:details rdf:resource="foo"/>
            <oslc_disc:services rdf:resource="bar"/>
            <jfs_proc:consumerRegistry rdf:resource="baz"/>
        </oslc_disc:ServiceProvider>
    </oslc_disc:entry>
    <oslc_disc:entry>
        <oslc_disc:ServiceProvider dc:title="Foo Bar Baz II">
            <oslc_disc:details rdf:resource="foo2"/>
            <oslc_disc:services rdf:resource="bar2"/>
            <jfs_proc:consumerRegistry rdf:resource="baz2"/>
        </oslc_disc:ServiceProvider>
    </oslc_disc:entry>
</oslc_disc:ServiceProviderCatalog>

【讨论】:

  • 您好,感谢您的回复。对此,我真的非常感激。我会尝试你的代码并试一试,看看我能做什么。我还可以问一下 perl 程序末尾的 DATA 位是什么。很多 perl 脚本都看到了。 (对不起,我是新手)。
  • 如果一个 Perl 脚本以__DATA__ 和一些数据结尾,那么在脚本中你可以使用一个特殊的文件句柄*DATA,这将允许你在末尾访问该数据脚本。这对于在同一个文件中提供脚本和输入数据非常方便。
  • 谢谢您的帮助。我会尽我所能,玩弄你的代码,以更好地了解我在做什么。谢谢。
猜你喜欢
  • 1970-01-01
  • 2023-03-22
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
相关资源
最近更新 更多