PHP：来自 cURL、HTML 扫描的数据

【问题标题】：PHP: Data from cURL, HTML ScanPHP：来自 cURL、HTML 扫描的数据
【发布时间】：2009-12-28 20:24:49
【问题描述】：

我如何扫描一个 html 页面，以获取某个 div 中的文本？

【问题讨论】：

【解决方案1】：

// Create a DOM object from a URL
$html = file_get_html('http://www.google.com/');    

// Find all <div> which attribute id=foo
$ret = $html->find('div[id=foo]');

【讨论】：

【解决方案2】：

您也可以使用DOMDocument 类来做到这一点。

用法非常简单：

$dom = new DOMDocument();
$dom->loadHTML(file_get_contents($url));

// Example:
$dom->getElementById('foo');

文档是here。

可以在here 找到真实世界使用的示例。

【讨论】：

【解决方案3】：

您可以按照其他人的建议使用内置功能，或者您可以尝试将 Simple HTML DOM Parser 实现为一个简单的 PHP 类和一些辅助函数。它支持 CSS 选择器样式的屏幕抓取（例如在 jQuery 中），可以处理无效的 HTML，甚至提供熟悉的界面来操作 DOM。

值得一试http://simplehtmldom.sourceforge.net/

【讨论】：

【解决方案4】：

preg_match() 匹配你想要的子字符串或使用 dom/xml。

【讨论】：