【发布时间】:2016-11-24 15:44:21
【问题描述】:
我正在使用 CURL 来检索页面并存储 HTML。我成功地做到了这一点,最终得到了一个包含与此类似的 HTML 的变量(td 中的内容不一样,总是在变化):
html code above....
<tr class="myclass">
<td>Dynamic Content One</td>
<td>Dynamic Content Two</td>
<td>Dynamic Content Three</td>
</tr>
<tr class="myclass">
<td>Dynamic Content One</td>
<td>Dynamic Content Two</td>
<td>Dynamic Content Three</td>
</tr>
More of the same <tr> ......
html code below....
我现在的目标是解析 html 并拥有一个名为 result() 的关联数组,它将所有 <tr> 存储为元素,该数组应如下所示:
$result[0]["first_content"] = "Dynamic Content One"
$result[0]["second_content"] = "Dynamic Content Two"
$result[0]["third_content"] = "Dynamic Content Three"
$result[1]["first_content"] = "Dynamic Content One"
$result[1]["second_content"] = "Dynamic Content Two"
$result[1]["third_content"] = "Dynamic Content Three"
.. more elements in array depending on how many <tr> there was
我发现解析这样的东西非常棘手。我使用过 DOMdocument Module 和 DOMXpath 模块,但我所取得的成就是拥有一个包含每个 <td> 的元素的数组,并且不确定将算法存储到数组中的位置。也许有更好的方法来做到这一点?这是我当前的代码:
$dom = new DOMDocument;
@$dom -> loadHTML($retrievedHtml);
$xPath = new DOMXpath($dom);
$xPathQuery = "//tr[@class='myclass']";
$elements = $xPath -> query($xPathQuery);
if(!is_null($elements)){
$results = array();
foreach($elements as $element){
$nodes = $element -> childNodes;
print $nodes -> nodeValue;
foreach($nodes as $node){
$results[] = $node -> nodeValue;
}
}
【问题讨论】:
标签: php parsing xpath web-scraping domdocument