PHP CURL / XPATH - 链接不起作用答案

【问题标题】：PHP CURL / XPATH - Links not workingPHP CURL / XPATH - 链接不起作用
【发布时间】：2017-01-05 16:05:14
【问题描述】：

我正在使用以下代码为 http://psnc.org.uk/our-latest-news-category/psnc-news/ 抓取一些外部 div

我想抓取 PSNC News 最新消息部分

$ch = curl_init("http://psnc.org.uk/our-latest-news-category/psnc-news/");
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$output = curl_exec($ch);
curl_close($ch);

$document = new DOMDocument;
libxml_use_internal_errors(true);
$document->loadHTML($output);
$xpath = new DOMXPath($document);

$tweets = $xpath->query("//article[@class='news-template-box']");

echo "<html><body>";
foreach ($tweets as $tweet) {
echo "\n<p>".$tweet->nodeValue."</article>\n";
}
echo "</html></body>";

它成功地抓取了文本，但链接/href 的/图像实际上所有元素都没有出现。

我错过了什么吗？

【问题讨论】：

当你把 $xpath->query("*");你得到所有数据
我只想抓取一个 DIV 而不是整个页面
哪个 div ？ ???
文章 class="news-template-box"
或

标签： php curl xpath domdocument scrape

【解决方案1】：

DOMNode::nodeValue == DOMNode::textContent，只打印文本内容。

http://php.net/manual/en/class.domnode.php#domnode.props.nodevalue

$tweets = $xpath->query("//article[@class='news-template-box']");

foreach ($tweets as $tweet) {
    echo $document->saveHTML($tweet);
}

【讨论】：