【发布时间】:2015-12-28 01:21:24
【问题描述】:
我正在尝试解析一个大型 XML 文件并将其加载到 MySQL 中。我已经使用simplexml 来解析它,它运行良好,但是对于这个大型 XML 文件来说速度很慢。现在我正在尝试使用XMLReader。
这是 XML 的示例:
<?xml version="1.0" encoding="UTF-8"?>
<drug type="biotech" created="2005-06-13" updated="2015-02-23">
<drugbank-id primary="true">DB00001</drugbank-id>
<drugbank-id>BIOD00024</drugbank-id>
<drugbank-id>BTD00024</drugbank-id>
<name>Lepirudin</name>
<description>Lepirudin is identical </description>
<cas-number>120993-53-5</cas-number>
<groups>
<group>approved</group>
</groups>
<pathways>
<pathway>
<smpdb-id>SMP00278</smpdb-id>
<name>Lepirudin Action Pathway</name>
<drugs>
<drug>
<drugbank-id>DB00001</drugbank-id>
<name>Lepirudin</name>
</drug>
<drug>
<drugbank-id>DB01373</drugbank-id>
<name>Calcium</name>
</drug>
</drugs>
...
</drug>
<drug type="biotech" created="2005-06-15" updated="2015-02-25">
...
</drug>
这是我使用simplexml的方法:
<?php
$xml = simplexml_load_file('drugbank.xml');
$servername = "localhost"; // Example : localhost
$username = "root";
$password = "pass";
$dbname = "dbname";
// Create connection
$conn = new mysqli($servername, $username, $password, $dbname);
// Check connection
if ($conn->connect_error) {
die("Connection failed: " . $conn->connect_error);
}
$xmlObject_count = $xml->drug->count();
for ($i=0; $i < $xmlObject_count; $i++) {
$name = $xml->drug[$i]->name;
$description = $xml->drug[$i]->description;
$casnumber = $xml->drug[$i]->{'cas-number'};
// ...
$created = $xml->drug[$i]['created'];
$updated = $xml->drug[$i]['updated'];
$type = $xml->drug[$i]['type'];
$sql = "INSERT INTO `drug` (name, description,cas_number,created,updated,type)
VALUES ('$name', '$description','$casnumber','$created','$updated','$type')";
if ($conn->query($sql) === TRUE) {
$last_id = $conn->insert_id;
} else {
echo "outer else Error: " . $sql . "<br>" . $conn->error. "<br>" ;
}
}
$conn->close();
它工作正常,它给了我 7,789 行。但是,我想使用XMLReader 来解析它。但是XMLReader 的问题我发现它提供了超过 35,000 行。
如果您查看 XML,您可以看到在 <drug /> 节点内还有一些其他的 <drugs><drug> 子节点。我该如何克服这个问题?
这是我使用XMLReader 的过程:
<?php
$servername = "localhost"; // Example : localhost
$username = "root";
$password = "pass";
$dbname = "dbname";
// Create connection
$conn = new mysqli($servername, $username, $password, $dbname);
// Check connection
if ($conn->connect_error) {
die("Connection failed: " . $conn->connect_error);
}
$reader = new XMLReader();
$reader->open('drugbank.xml');
while ($reader->read())
{
if ($reader->nodeType == XMLReader::ELEMENT && $reader->name == 'drug')
{
$doc = new DOMDocument('1.0', 'UTF-8');
$xml = simplexml_import_dom($doc->importNode($reader->expand(),true));
$name = $xml->name;
$description = $xml->description;
$casnumber = $xml->{'cas-number'};
// ...
$sql = "INSERT INTO `drug` (name, description,cas_number,created,updated,type)
VALUES ('$name', '$description','$casnumber','$created','$updated','$type')";
if ($conn->query($sql) === TRUE) {
$last_id = $conn->insert_id;
} else {
echo "outer else Error: " . $sql . "<br>" . $conn->error. "<br>" ;
}
}
}
$conn->close();
在这个例子中,我发现它提供了超过 35,000 行。
【问题讨论】:
-
使用 PHP 解析大型 XML 文件是个坏主意
标签: php performance simplexml xmlreader