【问题标题】:undefined method text for NilClassNilClass 的未定义方法文本
【发布时间】:2017-06-15 02:04:31
【问题描述】:

我有一个用 Ruby 编写的简单爬虫,它应该爬取特定站点并将数据保存到 CSV 文件中。当我尝试运行脚本时,我不断收到未定义的方法错误:

boxers.rb:29:in `<main>': undefined method `text' for nil:NilClass (NoMethodError)

这是我要运行的脚本代码:

#!/usr/bin/env ruby

require 'csv'
require 'mechanize'

agent = Mechanize.new{ |agent| agent.history.max_size=0 }
agent.user_agent = 'Mozilla/5.0'

base = "http://siteurl.com/"

division = ARGV[0]

search_url = "http://siteurl.com/ratings.php?sex=M&division=#{division}&pageID="

path='//*[@id="mainContent"]/table/tr[position()>2]'

boxers = CSV.open("csv/file.csv","w")

url = search_url+"1"

begin
  page = agent.get(url)
rescue
  print "  -> error, retrying\n"
  retry
end

// propably the line that causes error
a = page.parser.xpath('//a[@title="last page"]').first.text
a.gsub!("[","")
a.gsub!("]","")

last = a.to_i

(1..last).each do |page|

  url = search_url+page.to_s

  begin
    page = agent.get(url)
  rescue
    print "  -> error, retrying\n"
    retry
  end

  page.parser.xpath(path).each do |tr|
    row = [division]
    tr.xpath("td").each_with_index do |td,j|
      case j
      when 0,11
        next
      when 2
        text = td.text.strip
        a = td.xpath("a").first
        href = base+a.attributes["href"].value.strip
        human_id = href.split("=")[1].split("&")[0]
        cat = href.split("=")[2]
        row += [human_id, cat, text, href]
      when 4
        text = td.text.strip
        record = text.split("-")
        wins = record[0]
        wko = wins.split("(")[1].split(")")[0] rescue 0
        wins = wins.split("(")[0]
        losses = record[1]
        lko = losses.split("(")[1].split(")")[0] rescue 0
        losses = losses.split("(")[0]
        draws = record[2]
        row += [wins, wko, losses, lko, draws, text]
      when 5
        last6 = []
        td.xpath("table/tr/td").each do |td2|
          outcome = td2.attributes["class"].value.strip rescue nil
          last6 += [outcome]
        end
        last6 = last6.to_s.gsub("[","{").gsub("]","}")
        row += [last6]
      when 9
        div = td.xpath("div").first
        flag = div.attributes["class"].value.strip rescue nil
        title = div.attributes["title"].value.strip rescue nil
        row += [flag,title]
      else
        text = td.text.strip
        row += [text]
      end
    end
    if (row.size>2)
      boxers << row
    end
  end
  boxers.flush

end

boxers.close

【问题讨论】:

  • 你能评论引发错误的那一行吗?
  • 为应该导致错误的行添加了注释。

标签: ruby


【解决方案1】:

您正在调用 .text 的东西没有价值,或 nil

根据错误信息,它在第 29 行,这让我相信这行是罪魁祸首:

a = page.parser.xpath('//a[@title="last page"]').first.text

看起来当xpath(...) 在任何元素上都不匹配时,它会返回一个空枚举。所以first 找不到任何东西,所以它返回 nil。

解决方案是检查nil。在 Ruby 中检查 nil 有很多指导和资源,例如 this question

【讨论】:

  • 顺便说一句,您是否注意到 OP 正在从他的代码访问此 url:siteurl.com/ratings.php?sex=M&division=#{division}&pageID=1 但 url 返回 404 ;)
  • 我假设 OP 更改了问题中的 URL 以掩盖被抓取的实际页面。