rails中的网络爬虫，如何爬取网站的所有页面答案

【问题标题】：web crawler in rails,how to crawl all pages of the siterails中的网络爬虫，如何爬取网站的所有页面
【发布时间】：2013-10-11 05:32:17
【问题描述】：

我需要从给定域的所有页面中获取所有 url，
我认为使用后台作业将它们放在多个队列中是有意义的
尝试使用 cobweb 但它似乎很令人困惑的宝石，
还有anomone，如果页面很多，海葵会工作很长时间

require 'anemone'

Anemone.crawl("http://www.example.com/") do |anemone|
  anemone.on_every_page do |page|
      puts page.links
  end
end

你觉得什么最适合我？

【问题讨论】：

【解决方案1】：

你可以使用Nutch爬虫，Apache Nutch是一个高度可扩展和可扩展的开源网络爬虫软件项目。

【讨论】：