【发布时间】:2012-09-01 22:34:10
【问题描述】:
我试图从某些网页中获取“链接”元素。我无法弄清楚我做错了什么。我收到以下错误:
严重性:警告
消息:DOMDocument::loadHTML() [domdocument.loadhtml]: htmlParseEntityRef:实体中没有名称,行:536
文件名:controllers/test.php
行号:34
代码中的第 34 行如下:
$dom->loadHTML($html);
我的代码:
$url = "http://www.amazon.com/";
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 10);
if($html = curl_exec($ch)){
// parse the html into a DOMDocument
$dom = new DOMDocument();
$dom->recover = true;
$dom->strictErrorChecking = false;
$dom->loadHTML($html);
$hrefs = $dom->getElementsByTagName('a');
echo "<pre>";
print_r($hrefs);
echo "</pre>";
curl_close($ch);
}else{
echo "The website could not be reached.";
}
【问题讨论】:
-
已更改为谷歌友好的网址。如果它不适合您,请将其还原..
标签: php html-parsing domdocument