如何处理python爬虫的urlopen错误？答案

【问题标题】：how to handle python crawler's urlopen error?如何处理python爬虫的urlopen错误？
【发布时间】：2016-08-27 22:32:18
【问题描述】：

我在写python爬虫的时候，经常使用urlopen。有时候打不开url（所以报错），但是当我重试打开这个url时，就成功了。所以我通过这样编写爬虫来处理这种情况：

def url_open(url):
'''open the url and return its content'''
req = urllib.request.Request(headers=header, url=url)
while True:
    try:
        response = urllib.request.urlopen(req)
        break
    except:
        continue
contents = response.read().decode('utf8')
return contents

我认为这段代码很丑……但它可以工作，那么有什么优雅的方法可以做到这一点吗？

【问题讨论】：

标签： python web-crawler urlopen

【解决方案1】：

我强烈推荐使用requests 库。您最终可能会遇到同样的问题，但我发现请求更易于处理且更可靠。

同样的请求会是这样的

def url_open(url):
    while True:
        try:
            response = requests.get(url, headers=header)
            break
       except:
            continue
return response.text

你遇到了什么错误？

【讨论】：

具体是什么我记不清了，错误码以1开头。

【解决方案2】：

我建议继续使用带有会话和适配器的请求 API，以便您可以显式设置重试次数。它的代码更多，但绝对更干净：

import requests
session = requests.Session()
http_adapter = requests.adapters.HTTPAdapter(max_retries=3)
https_adapter = requests.adapters.HTTPAdapter(max_retries=3)
session.mount('http://', http_adapter)
session.mount('https://', https_adapter)
response = s.get(url)
if response.status_code != 200 then:
   # Handle the request failure here
   pass

【讨论】：