使用 simple_html_dom.php 从不同的 URL 抓取大量数据答案

【问题标题】：Scraping much data from different URLs with simple_html_dom.php使用 simple_html_dom.php 从不同的 URL 抓取大量数据
【发布时间】：2013-05-12 20:39:27
【问题描述】：

我基本上想做这样的事情：Simple Html DOM Caching

到目前为止，我已经完成了所有工作，但现在我收到了以下错误，因为我抓取了许多网站（目前为 6 个，我最多需要 25 个网站）：

Fatal error: Allowed memory size of 33554432 bytes exhausted (tried to allocate 39 bytes)

我是一个 php 新手 =/...所以，我怎样才能逐步“序列化”我的记忆不放弃的抓取过程？ :-)

代码示例：

// Include the library
include('simple_html_dom.php');

// retrieve and find contents
$html0 = file_get_html('http://www.site.com/');
foreach($html0->find('#id') as $aktuelle_spiele);

file_put_contents("cache/cache0.html",$aktuelle_spiele);

非常感谢您的帮助！

【问题讨论】：

提高内存限制....看来您只有 32 mb 的内存限制

标签： php html performance dom

【解决方案1】：

您可以在脚本开始时增加内存。

像这样：

ini_set('memory_limit', '128M');

【讨论】：

【解决方案2】：

在你的 php.ini 中，改变这一行：

memory_limit = 32M

有了这个：

memory_limit = 256M //或另一个更大的值

或者在每个使用 simple_html_dom 的 php 脚本的开头添加这段代码：

ini_set('memory_limit', '128M'); //or a greater value

【讨论】：

@ThomasVeit，您可以在脚本开始时增加内存。像这样：ini_set('memory_limit', '128M')
mhhh，我的主机不允许超过 32M 内存...有人有另一个想法来减少我的脚本内存？
@LuigiSiri，哇，非常感谢你！这就像一个魅力！ :)
我以为你有自己的服务器...你可以添加@Luigi 建议的行以及使用简单 html dom php 的每个脚本的开头
@ThomasVeit。我已经粘贴了我的评论作为答案。对不起罗伯特。 +1