为什么我在抓取网站时收到 SSL 错误？答案

【问题标题】：Why do I get SSL error when I crawl a website?为什么我在抓取网站时收到 SSL 错误？
【发布时间】：2019-03-22 09:17:22
【问题描述】：

我有以下 Python 脚本来抓取网站 https://www.notebooksbilliger.de 的 Monitor 的价格：

from lxml import html
import csv, os, json
import requests
from time import sleep

url = "https://www.notebooksbilliger.de/asus+vz239he"
headers = { 'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/42.0.2311.90 Safari/537.36'}
page = requests.get(url, headers=headers)
doc = html.fromstring(page.content)
RAW_PRICE = doc.xpath('//div[@id="product_detail_price"]')[0].values()[4]

但我收到以下错误：urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='www.notebooksbilliger.de', port=443): Max retries exceeded with url: /asus+vz239he (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self signed certificate in certificate chain (_ssl.c:1045)'))

你知道我为什么会收到这个错误吗？

【问题讨论】：

类似的脚本适用于 amazon.de ....
可能您的（默认）OpenSSL 版本太旧了。
@CristiFati 但我可以从我的普通浏览器访问该网站。只是不适用于脚本
请在问题中包含此类信息（Python 版本、OS）。
能够使用 Python 3.6 和 Python 3.7 运行您的脚本。可能我的证书存储中有一些必需的证书，而你没有。

标签： python ssl certificate

【解决方案1】：

可能不是最佳实践，但对我有用：page = requests.get(url, headers=headers, verify=False)

在请求中添加了 verfiy=False

【讨论】：