Ruby open-uri，打开 png URL 时返回错误答案

【问题标题】：Ruby open-uri, returns error when opening a png URLRuby open-uri，打开 png URL 时返回错误
【发布时间】：2010-02-26 14:42:47
【问题描述】：

我正在制作一个爬虫，在 http://manga.bleachexile.com/gantz-chapter-1.html 等上的 Gantz 漫画上解析图像。

在我的爬虫尝试打开图像之前，我取得了成功（第 273 章）：

错误的 URI（不是 URI？）：http://static.bleachexile.com/manga/gantz/273/Gantz[0273]_p001[Whatever-Illuminati].png

但我猜这个网址是有效的，因为我可以从 Firefox 中打开。有什么想法吗？

部分代码：

img_link = nav.page.image_urls.find {|x| x.include?("manga/gantz")}
img_name = RAILS_ROOT+"/public/#{nome}/#{cap}/"+nome+((template).sub('::cap::', cap.to_s).sub('::pag::', i.to_s))
img = File.new( img_name, 'w' )
img.write( open(img_link) {|f| f.read} )
img.close

【问题讨论】：

标签： ruby url url-routing web-crawler

【解决方案1】：

这不是一个有效的 uri。 uri 只允许使用某些字符。顺便说一句，firefox 像所有浏览器一样，会尽可能地为用户做事，而不是在看起来不符合标准时抱怨。

以下形式有效：

open("http://static.bleachexile.com/manga/gantz/273/Gantz%5B0273%5D_p001%5BWhatever-Illuminati%5D.png") # => #<File:/tmp/open-uri20100226-3342-clj08a-0>

你可以尝试这样逃避它：

uri.gsub(/\/.*/) do |t|
  t.gsub(/[^.\/a-zA-Z0-9\-_ ]/) do |c|
    "%#{ c[0]<16 ? "0" : "" }#{ c[0].to_s(16).upcase }"
  end.gsub(" ", "+")
end

但要小心，如果网站使用正确的转义 uri，而您再次转义它们。 uri 不再指向同一个位置。

【讨论】：

这个答案简直太完美了！