【问题标题】:Nokogiri scraping test failing?Nokogiri刮擦测试失败?
【发布时间】:2013-08-15 08:23:09
【问题描述】:

我有以下代码:

  url       = "http://www.amazon.com/s/ref=nb_sb_noss?url=search-alias%3Dpets&field-keywords="
  data      = Nokogiri::HTML(open(url))

  department   = data.css('#ref_2619534011')

  @department_hash = {}
  department.css('li').drop(1).each do | department |
    department_title = department.css('.refinementLink').text
    department_count = department.css('.narrowValue').text[/[\d,]+/].delete(",").to_i
    @department_hash[:department] ||= {}
    @department_hash[:department]["Pet Supplies"] ||= {}
    @department_hash[:department]["Pet Supplies"][department_title] = department_count
  end 

所以当我在模板中执行<%= @department_hash %> 时,我会得到:

{:department=>{"Pet Supplies"=>{"Birds"=>15918, "Cats"=>245418, "Dogs"=>513869, "Fish & Aquatic Pets"=>47182, "Horses"=>14774, "Insects"=>358, "Reptiles & Amphibians"=>5834, "Small Animals"=>19806}}} 

我为app.rb 创建了一个规范(我正在使用 Sinatra):

app_spec.rb:

require File.dirname(__FILE__) + '/app.rb'

describe "Department" do
  it "should scrap correct string" do
    expect(@department_hash).to eq '{:department=>{"Pet Supplies"=>{"Birds"=>15918, "Cats"=>245418, "Dogs"=>513869, "Fish & Aquatic Pets"=>47182, "Horses"=>14774, "Insects"=>358, "Reptiles & Amphibians"=>5834, "Small Animals"=>19806}}}' 
  end
end

但测试失败:

  1) Department should scrap correct string
     Failure/Error: expect(@department_hash).to eq '{:department=>{"Pet Supplies"=>{"Birds"=>15918, "Cats"=>245418, "Dogs"=>513869, "Fish & Aquatic Pets"=>47182, "Horses"=>14774, "Insects"=>358, "Reptiles & Amphibians"=>5834, "Small Animals"=>19806}}}'

       expected: "{:department=>{\"Pet Supplies\"=>{\"Birds\"=>15918, \"Cats\"=>245418, \"Dogs\"=>513869, \"Fish & Aquatic Pets\"=>47182, \"Horses\"=>14774, \"Insects\"=>358, \"Reptiles & Amphibians\"=>5834, \"Small Animals\"=>19806}}}"
            got: nil

       (compared using ==)
     # ./app_spec.rb:5:in `block (2 levels) in <top (required)>'

编辑:

我试过了:

expect(@department_hash[:department]["宠物用品"].keys).to eq '[“鸟”、“猫”、“狗”、“鱼和水生宠物”、“马”、“昆虫”、 “爬行动物和两栖动物”、“小动物”]'

但测试也失败了:

2) 部门应废弃正确的密钥 失败/错误:expect(@department_hash[:department]["Pet Supplies"].keys).to eq '["Birds", "Cats", "Dogs", "Fish & Aquatic 宠物”、“马”、“昆虫”、“爬行动物和两栖动物”、“小动物”]' 无方法错误: '

中未定义的方法[]' for nil:NilClass # ./app_spec.rb:9:inblock(2 级)

可能是什么原因?

【问题讨论】:

    标签: ruby nokogiri rspec2


    【解决方案1】:

    @department_hash 未定义在测试范围内。

    举个简单的例子:

    require 'rspec/autorun'
    
    @department_hash = 2
    puts defined?(@department_hash)
    #=> "instance-variable"
    
    describe "Department" do
      it "should scrap correct string" do
            puts defined?(@department_hash)
            #=> "" (ie not defined)
        end
    end
    

    可以看到@department_hash是在main里面定义的,但是在test里面没有定义。

    您需要在测试范围内运行您的应用代码。例如,将代码移动到测试中,@department_hash 将不再为 nil。

    require 'rspec/autorun'
    require 'nokogiri'
    require 'open-uri'
    
    describe "Department" do
      it "should scrap correct string" do
        url       = "http://www.amazon.com/s/ref=nb_sb_noss?url=search-alias%3Dpets&field-keywords="
        data      = Nokogiri::HTML(open(url))
    
        department   = data.css('#ref_2619534011')
    
        @department_hash = {}
        department.css('li').drop(1).each do | department |
          department_title = department.css('.refinementLink').text
          department_count = department.css('.narrowValue').text[/[\d,]+/].delete(",").to_i
          @department_hash[:department] ||= {}
          @department_hash[:department]["Pet Supplies"] ||= {}
          @department_hash[:department]["Pet Supplies"][department_title] = department_count
        end       
    
        expect(@department_hash).to eq({:department=>{"Pet Supplies"=>{"Birds"=>17556, "Cats"=>245692, "Dogs"=>516246, "Fish & Aquatic Pets"=>47424, "Horses"=>15062, "Insects"=>358, "Reptiles & Amphibians"=>5835, "Small Animals"=>19836}}})
      end
    end
    

    请注意,您的测试应该是eq(hash) 而不是eq 'hash'(即您想要比较散列而不是字符串与散列。

    更新 - 提取到类的代码:

    将应用程序代码移到测试中并不理想,如果它是为了在其他地方可重用。相反,最好为您的应用程序代码创建一个方法或类,然后从测试中调用该代码。

    require 'rspec/autorun'
    require 'nokogiri'
    require 'open-uri'
    
    # This class could be placed in your app.rb  
    class DepartmentScraper
      def scrape_page()
        url       = "http://www.amazon.com/s/ref=nb_sb_noss?url=search-alias%3Dpets&field-keywords="
        data      = Nokogiri::HTML(open(url))
    
        department   = data.css('#ref_2619534011')
    
        @department_hash = {}
        department.css('li').drop(1).each do | department |
          department_title = department.css('.refinementLink').text
          department_count = department.css('.narrowValue').text[/[\d,]+/].delete(",").to_i
          @department_hash[:department] ||= {}
          @department_hash[:department]["Pet Supplies"] ||= {}
          @department_hash[:department]["Pet Supplies"][department_title] = department_count        
        end
    
        return @department_hash
      end
    end
    
    describe "Department" do
      it "should scrap correct string" do
        department_hash = DepartmentScraper.new.scrape_page()
    
        expect(department_hash).to eq({:department=>{"Pet Supplies"=>{"Birds"=>17556, "Cats"=>245692, "Dogs"=>516246, "Fish & Aquatic Pets"=>47424, "Horses"=>15062, "Insects"=>358, "Reptiles & Amphibians"=>5835, "Small Animals"=>19836}}})
      end
    end
    

    【讨论】:

    • 有什么方法可以在不移动我所有文件的情况下进行测试吗?这就是为什么我把它放在顶部。
    • 您可以将应用程序代码提取到方法和/或类中(这些可能位于您的 app.rb 文件中)。然后从测试中,您将调用这些方法/类。答案中添加了一个简单示例。
    • 非常感谢!我会尝试并告诉你结果。
    • 感谢它的工作。顺便问一下,rspec/autorun 部分是关于什么的?自动包含app.rb?
    • 通常你会使用像rspec your_spec.rb这样的命令来运行你的规范。通过要求rspec/autorun,您的测试将始终在包含文件时运行 - 即在您只执行ruby your_spec.rb 时运行。您可能应该在没有autorun 的情况下这样做。但是,对于一次性脚本或者如果使用 SciTE 作为编辑器,我发现使用 autorun 更方便。
    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2020-01-09
    • 2021-09-06
    • 2014-08-10
    相关资源
    最近更新 更多