【发布时间】:2016-03-24 17:19:40
【问题描述】:
下面是我在 id=Summary 下输出文本的 php 代码。好吧,这个脚本适用于其他网站,但不适用于维基百科。我还粘贴了我在下面遇到的错误。维基百科是否限制解析器脚本?如果是这样,是否有任何解决方案可以从 wiki 解析和获取内容? 提前致谢。
<?php
function getElementByIdAsString($url, $id, $pretty = true) {
$doc = new DOMDocument();
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.87 Safari/537.36');
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$result = curl_exec($ch);
// var_dump($doc->loadHTMLFile($url)); die;
error_reporting(E_ERROR | E_PARSE);
if(!$result) {
throw new Exception("Failed to load $url");
}
$doc->loadHTML($result);
// Obtain the element
$element = $doc->getElementById($id);
if(!$element) {
throw new Exception("An element with id $id was not found");
}
if($pretty) {
$doc->formatOutput = true;
}
// Return the string representation of the element
return $doc->saveXML($element);
}
//Here I am dispalying the output in bold text
echo getElementByIdAsString('https://en.wikipedia.org/wiki/A_Brief_History_of_Time', 'Summary');
?>
错误:
Fatal error: Uncaught exception 'Exception' with message 'Failed to load http://en.wikipedia.org/wiki/A_Brief_History_of_Time' in C:\xampp\htdocs\example2.php:25 Stack trace: #0 C:\xampp\htdocs\example2.php(49): getElementByIdAsString() #1 {main} thrown in C:\xampp\htdocs\example2.php on line 25
【问题讨论】:
-
您遇到的 CURL 错误是什么?
-
php.net/manual/en/function.curl-error.php 这个函数会从 CURL 返回错误
-
SSL证书问题,验证CA证书是否OK。详细信息:错误:14090086:SSL 例程:SSL3_GET_SERVER_CERTIFICATE:证书验证失败
-
我试过了,但没有用。如果可能的话,你能把我的代码改正成功能吗?谢谢
标签: php parsing web-crawler wiki