【发布时间】:2021-01-26 08:39:58
【问题描述】:
我正在尝试用 Rails 制作一个基本的网络爬虫。每次我点击抓取按钮时,它都会将我发送到正确的位置,但每次都会给我这个错误。
这是我的restaurants_controller.rb 文件
def scrape
url = 'https://www.tripadvisor.com/Restaurants-g31892-Rogers_Arkansas.html'
response = RestaurantsScraper.process(url)
if response[:status] == :completed && response[:error].nil?
flash.now[:notice] = "Successfully scraped url"
else
flash.now[:alert] = response[:error]
end
rescue StandardError => e
flash.now[:alert] = "Error: #{e}"
end
这是我的restaurantscrapper.rb 文件
class RestaurantsScraper < Kimurai::Base
@name = 'restaurants_scraper'
@engine = :mechanize
def self.process(url)
@start_url = [url]
self.crawl!
end
def parse(response, url, data:{})
response.xpath("//div[@class=_1llCuDZj]").each do |t|
item = {}
item[:title] = t.css('a._15_ydu6b')&.text&.squish&.gsub('[^0-9].', '')
item[:type] = t.css('span._1p0FLy4t')&.text&.squish
item[:reviews] = t.css('span.w726Ki5B').text&.squish
item[:top_reviews] = t.css('a._2uEVo25r _3mPt7dFq').text&.squish
Restaurant.where(item).first_or_create
end
end
end
这是控制台中的错误
Processing by RestaurantsController#scrape as HTML
Parameters: {"authenticity_token"=>"3qXvtTOsU6VVtxaPvNXyCjpdnHLOgCvFgQYzB1JnhoHDz8ySF6gK/n5x+/XW5HC0HwfzQ1bFCu/KCfF3nA1SIQ=="}
I, [2021-01-26 02:35:35 -0600#218] [C: 14760] INFO -- restaurants_scraper: Spider: started: restaurants_scraper
F, [2021-01-26 02:35:35 -0600#218] [C: 14760] FATAL -- restaurants_scraper: Spider: stopped: {:spider_name=>"restaurants_scraper", :status=>:failed, :error=>"#<ArgumentError: wrong number of arguments (given 0, expected 2)>", :environment=>"development", :start_time=>2021-01-26 02:35:35.6330106 -0600, :stop_time=>2021-01-26 02:35:35.6336933 -0600, :running_time=>"0s", :visits=>{:requests=>0, :responses=>0}, :items=>{:sent=>0, :processed=>0}, :events=>{:requests_errors=>{}, :drop_items_errors=>{}, :custom=>{}}}
Rendering restaurants/scrape.html.erb within layouts/application
Rendered restaurants/scrape.html.erb within layouts/application (Duration: 0.1ms | Allocations: 41)
[Webpacker] Everything's up-to-date. Nothing to do
Completed 200 OK in 45ms (Views: 41.9ms | ActiveRecord: 0.0ms | Allocations: 4104)
【问题讨论】:
-
错误来自哪个行和文件?检查你的控制台输出,它会在错误下面的行告诉你。
-
我编辑了帖子以添加错误。还是一头雾水。
-
你能在
RestaurantsScraper中发布crawl方法的签名吗?它可能需要两个参数。 -
@PenoG 我认为
parse方法在参数中需要url::github.com/augustoam/kimurai#minimum-required-crawler-structure 此外,它应该是@start_urls(复数)。不太确定是不是这个问题。但是尽量放日志或者使用调试器 -
我正在使用 Kimurai gem 进行爬行!方法rubydoc.info/gems/kimurai/1.3.2/Kimurai/…
标签: ruby-on-rails ruby web-scraping routes