【问题标题】:How do I parsing dates from Nokogiri?如何从 Nokogiri 解析日期?
【发布时间】:2013-07-28 11:12:17
【问题描述】:

我正在尝试解析我以前的高中橄榄球队的日程安排。

我已经设法获取包含每个游戏日期的节点,但是当我尝试将其转换为 Ruby 日期对象时,我收到了无效日期错误。但是,当我将 puts gamedate 生成的日期复制并粘贴到脚本中以进行测试时,它会很好地转换为日期对象。

gamedate 的字符串传递给strptime 并将其作为硬编码输入粘贴有什么区别?

require 'rubygems'
require 'nokogiri'
require 'open-uri'
url = "http://www.maxpreps.com/high-schools/fitzgerald-hurricanes-(fitzgerald,ga)/football/schedule.htm"
doc = Nokogiri::HTML(open(url))
games = doc.css('.dual-contest')
games.each do |game|
  puts gamedate = game.css(".event-date").xpath('@title').to_s
  #works
  puts date = DateTime.strptime('2013-08-24T02:30:00','%Y-%m-%dT%H:%M:%S')
  #does not work
  puts date = DateTime.strptime(gamedate,'%Y-%m-%dT%H:%M:%S')
end

【问题讨论】:

  • 我正在获取数据...正确..您使用的是哪个版本的 ruby​​?
  • Ruby 版本 1.9.3p392
  • 只是为了澄清当我将示例输出从 puts gamedate 复制并粘贴到字符串时它可以工作。但是,当我将 gamedate 直接传递给同一个函数时,它会失败。
  • nokogiri的版本是多少?
  • 当您提出问题时,您需要同时包含代码、AND 示例数据,而不是指向包含数据的链接。照原样,WHEN 该链接失效,您的问题将几乎毫无价值。有了数据,它将继续有用。请参阅sscce.org 了解更多信息。

标签: ruby nokogiri ruby-on-rails-4


【解决方案1】:

看看原因:

require 'rubygems'
require 'nokogiri'
require 'open-uri'
url = "http://www.maxpreps.com/high-schools/fitzgerald-hurricanes-(fitzgerald,ga)/football/schedule.htm"
doc = Nokogiri::HTML(open(url))
games = doc.css('.dual-contest')
games.each do |game|
  puts gamedate = game.css(".event-date").xpath('@title').empty?
end

# >> true
# >> false
# >> false
# >> false
# >> false
# >> false
# >> false
# >> false
# >> false
# >> false
# >> false

换个角度看,有一个表数据,有nil值:

require 'rubygems'
require 'nokogiri'
require 'open-uri'
url = "http://www.maxpreps.com/high-schools/fitzgerald-hurricanes-(fitzgerald,ga)/football/schedule.htm"
doc = Nokogiri::HTML(open(url))
games = doc.at_css('.dual-contest').at_css(".event-date").at_xpath('@title')
puts games
# ~> -:6:in `<main>': undefined method `at_xpath' for nil:NilClass (NoMethodError)

我会走这条路:-

require 'rubygems'
require 'nokogiri'
require 'open-uri'

url = "http://www.maxpreps.com/high-schools/fitzgerald-hurricanes-(fitzgerald,ga)/football/schedule.htm"
doc = Nokogiri::HTML(open(url))
doc.css('#schedule .event-date').each do |nd|
  dt = nd['title']
  p dt,DateTime.parse(dt)
end
# >> "2013-08-24T02:30:00"
# >> #<DateTime: 2013-08-24T02:30:00+00:00 ((2456529j,9000s,0n),+0s,2299161j)>
# >> "2013-09-07T02:30:00"
# >> #<DateTime: 2013-09-07T02:30:00+00:00 ((2456543j,9000s,0n),+0s,2299161j)>
# >> "2013-09-14T02:30:00"
# >> #<DateTime: 2013-09-14T02:30:00+00:00 ((2456550j,9000s,0n),+0s,2299161j)>
# >> "2013-09-21T02:30:00"
# >> #<DateTime: 2013-09-21T02:30:00+00:00 ((2456557j,9000s,0n),+0s,2299161j)>
# >> "2013-09-28T02:30:00"
# >> #<DateTime: 2013-09-28T02:30:00+00:00 ((2456564j,9000s,0n),+0s,2299161j)>
# >> "2013-10-05T02:30:00"
# >> #<DateTime: 2013-10-05T02:30:00+00:00 ((2456571j,9000s,0n),+0s,2299161j)>
# >> "2013-10-12T02:30:00"
# >> #<DateTime: 2013-10-12T02:30:00+00:00 ((2456578j,9000s,0n),+0s,2299161j)>
# >> "2013-10-19T02:30:00"
# >> #<DateTime: 2013-10-19T02:30:00+00:00 ((2456585j,9000s,0n),+0s,2299161j)>
# >> "2013-10-26T02:30:00"
# >> #<DateTime: 2013-10-26T02:30:00+00:00 ((2456592j,9000s,0n),+0s,2299161j)>
# >> "2013-11-02T02:30:00"
# >> #<DateTime: 2013-11-02T02:30:00+00:00 ((2456599j,9000s,0n),+0s,2299161j)>

【讨论】:

  • games CSS 查询还发现了一个似乎与 OP 正在处理的表行不同的 &lt;div&gt;。解决方法是提高该查询的选择性。
猜你喜欢
  • 2016-01-05
  • 2013-05-25
  • 1970-01-01
  • 1970-01-01
  • 2013-08-15
  • 1970-01-01
  • 2015-11-16
  • 2011-11-20
  • 1970-01-01
相关资源
最近更新 更多