I tried to use Hpricot to parse a page with special characters in a utf-8 encoding. The docs tell you to do this:

require 'rubygems'
require 'open-uri'
require 'hpricot'
 
doc = Hpricot(open("http://url/"))

However, this won’t give you the output you want. The open method on Open-URI leaves the output in the default character set of the page. If you want to convert it to utf-8, you need to use the iconv library:

require 'rubygems'
require 'iconv'
require 'open-uri'
require 'hpricot'
 
f = open("http://url")
f.rewind
doc = Hpricot(Iconv.conv('utf-8', f.charset, f.readlines.join("\n")))

相关文章:

  • 2021-11-09
  • 2022-12-23
  • 2021-07-15
  • 2021-12-01
  • 2022-01-26
  • 2022-12-23
猜你喜欢
  • 2022-03-06
  • 2021-08-12
  • 2022-12-23
  • 2021-11-21
  • 2021-08-19
  • 2021-05-24
相关资源
相似解决方案