open.connection(x, "rb") 中的错误：HTTP 错误 404，使用 read_html 函数答案

【问题标题】：Error in open.connection(x, "rb") : HTTP error 404, with the read_html functionopen.connection(x, "rb") 中的错误：HTTP 错误 404，使用 read_html 函数
【发布时间】：2020-01-14 07:34:35
【问题描述】：

使用 xml2 包中的read_html 函数时出现以下错误：

Error in open.connection(x, "rb") : HTTP error 404.

这是我试图阅读的网址：

xml2::read_html("https://www.act.is/media-centre/press-releases/actis-energy-platform-zuma-energía-reaches-financial-close-on-two-further-solar-farms-in-mexico/")

相比之下，读取此网址时没有产生错误

xml2::read_html("https://www.act.is/media-centre/press-releases/actis-wins-cio-magazine-s-real-asset-award/")

第一个 URL 包含带有重音符号“energía”的单词，第二个 URL 没有。是否可以读取包含带有重音符号的单词的 URL？

【问题讨论】：

标签： r url web-scraping xml2

【解决方案1】：

URL 中有特殊字符，您必须对其进行转义。在 Python 中有相应的 HTTP 库，对于 R，你可以找到 here

Python 示例：

base_url = "https://www.act.is/media-centre/press-releases/"
encoded_url = requests.utils.quote("actis-energy-platform-zuma-energía-reaches-financial-close-on-two-further-solar-farms-in-mexico/")
response = requests.get(base_url + encoded_url)

编码网址：

https://www.act.is/media-centre/press-releases/actis-energy-platform-zuma-energ%C3%ADa-reaches-financial-close-on-two-further-solar-farms-in-mexico/

【讨论】：

如果回答对您有帮助，您可以accept it