Python url 上的错误 404 Scrapy 要被抓取（有时）在浏览器中工作，但在 python 中不工作答案

【问题标题】：Error 404 Scrapy on Python url to be scraped works (sometimes) in browser but not in pythonPython url 上的错误 404 Scrapy 要被抓取（有时）在浏览器中工作，但在 python 中不工作
【发布时间】：2021-04-21 14:01:34
【问题描述】：

我正在做一个项目，需要抓取以下网址的数据：https://www.funda.nl/objectinsights/getdata/5628496/

url的最后一部分代表一个对象的ID。在浏览器中打开链接确实有效，但有时会返回 404 错误。在python中使用scrapy shell时也是如此，有时我可以刮掉url，有时我不能。

当我设法打开网址（没有 404 错误）时，我去检查 > 网络。但我没有足够的经验来理解这些信息。有人知道修复吗？或此主题的其他信息？

您可以尝试的额外网址：

https://www.funda.nl/objectinsights/getdata/5819260/
https://www.funda.nl/objectinsights/getdata/5819578/
https://www.funda.nl/objectinsights/getdata/5819237/
https://www.funda.nl/objectinsights/getdata/5819359/
https://www.funda.nl/objectinsights/getdata/5819371/
https://www.funda.nl/objectinsights/getdata/5819386/

【问题讨论】：

标签： python scrapy http-status-code-404

【解决方案1】：

我在 scrapy shell 中测试了这些，每次都得到 200 响应。

如果您即使来自浏览器也有间歇性 404 响应，这不是 Scrapy 问题。

他们可能会将您限制为每个 IP 地址或每分钟的少量请求。

尝试编写一些在请求之间有延迟的代码，或者使用轮换代理（如果您不想注册，可以免费试用）。

【讨论】：