【发布时间】:2017-06-15 02:04:31
【问题描述】:
我有一个用 Ruby 编写的简单爬虫,它应该爬取特定站点并将数据保存到 CSV 文件中。当我尝试运行脚本时,我不断收到未定义的方法错误:
boxers.rb:29:in `<main>': undefined method `text' for nil:NilClass (NoMethodError)
这是我要运行的脚本代码:
#!/usr/bin/env ruby
require 'csv'
require 'mechanize'
agent = Mechanize.new{ |agent| agent.history.max_size=0 }
agent.user_agent = 'Mozilla/5.0'
base = "http://siteurl.com/"
division = ARGV[0]
search_url = "http://siteurl.com/ratings.php?sex=M&division=#{division}&pageID="
path='//*[@id="mainContent"]/table/tr[position()>2]'
boxers = CSV.open("csv/file.csv","w")
url = search_url+"1"
begin
page = agent.get(url)
rescue
print " -> error, retrying\n"
retry
end
// propably the line that causes error
a = page.parser.xpath('//a[@title="last page"]').first.text
a.gsub!("[","")
a.gsub!("]","")
last = a.to_i
(1..last).each do |page|
url = search_url+page.to_s
begin
page = agent.get(url)
rescue
print " -> error, retrying\n"
retry
end
page.parser.xpath(path).each do |tr|
row = [division]
tr.xpath("td").each_with_index do |td,j|
case j
when 0,11
next
when 2
text = td.text.strip
a = td.xpath("a").first
href = base+a.attributes["href"].value.strip
human_id = href.split("=")[1].split("&")[0]
cat = href.split("=")[2]
row += [human_id, cat, text, href]
when 4
text = td.text.strip
record = text.split("-")
wins = record[0]
wko = wins.split("(")[1].split(")")[0] rescue 0
wins = wins.split("(")[0]
losses = record[1]
lko = losses.split("(")[1].split(")")[0] rescue 0
losses = losses.split("(")[0]
draws = record[2]
row += [wins, wko, losses, lko, draws, text]
when 5
last6 = []
td.xpath("table/tr/td").each do |td2|
outcome = td2.attributes["class"].value.strip rescue nil
last6 += [outcome]
end
last6 = last6.to_s.gsub("[","{").gsub("]","}")
row += [last6]
when 9
div = td.xpath("div").first
flag = div.attributes["class"].value.strip rescue nil
title = div.attributes["title"].value.strip rescue nil
row += [flag,title]
else
text = td.text.strip
row += [text]
end
end
if (row.size>2)
boxers << row
end
end
boxers.flush
end
boxers.close
【问题讨论】:
-
你能评论引发错误的那一行吗?
-
为应该导致错误的行添加了注释。
标签: ruby