【发布时间】:2016-09-25 03:03:53
【问题描述】:
我有这段代码试图转到一个 URL 并将“li”元素解析为一个数组。但是,在尝试解析不在“b”标签中的任何内容时,我遇到了问题。
代码:
url = '(some URL)'
page = Nokogiri::HTML(open(url))
csv = CSV.open("/tmp/output.csv", 'w')
page.search('//li[not(@id) and not(@class)]').each do |row|
arr = []
row.search('b').each do |cell|
arr << cell.text
end
csv << arr
pp arr
end
HTML:
<li><b>The Company Name</b><br>
The Street<br>
The City,
The State
The Zipcode<br><br>
</li>
我想解析所有元素,以便输出如下所示:
["The Company Name", "The Street", "The City", "The State", "The Zip Code"],
["The Company Name", "The Street", "The City", "The State", "The Zip Code"],
["The Company Name", "The Street", "The City", "The State", "The Zip Code"]
【问题讨论】:
标签: html ruby parsing csv nokogiri