【发布时间】:2021-12-07 10:28:30
【问题描述】:
我有一个网络 scraper,它可以在本地正常工作,但它无法在生产服务器 (CentOS) 上抓取 HTTPS 网页。
仅供参考。当我通过命令行(即 python scan.py)运行脚本时,一切都按预期工作,但是当我从 Django 视图运行爬虫时,我遇到了以下错误。
我在 CentOS 上使用 Apache 网络服务器。 (此外,它在我的本地 Django 设置中运行良好) 尝试抓取 HTTPS 网页时出现以下错误
[scrapy.core.scraper] ERROR: Error downloading <GET https://validurl.com/>
以下是完整的错误日志
Traceback (most recent call last):
[Wed Oct 20 08:55:14.843638 2021] [wsgi:error] [pid 2291806:tid 139906821093120] [remote 91.204.188.11:55294] File "/var/www/rocket/venv/lib/python3.6/site-packages/twisted/internet/defer.py", line 1658, in _inlineCallbacks
[Wed Oct 20 08:55:14.843643 2021] [wsgi:error] [pid 2291806:tid 139906821093120] [remote 91.204.188.11:55294] cast(Failure, result).throwExceptionIntoGenerator, gen
[Wed Oct 20 08:55:14.843647 2021] [wsgi:error] [pid 2291806:tid 139906821093120] [remote 91.204.188.11:55294] File "/var/www/rocket/venv/lib/python3.6/site-packages/twisted/internet/defer.py", line 63, in run
[Wed Oct 20 08:55:14.843652 2021] [wsgi:error] [pid 2291806:tid 139906821093120] [remote 91.204.188.11:55294] return f(*args, **kwargs)
[Wed Oct 20 08:55:14.843656 2021] [wsgi:error] [pid 2291806:tid 139906821093120] [remote 91.204.188.11:55294] File "/var/www/rocket/venv/lib/python3.6/site-packages/twisted/python/failure.py", line 500, in throwExceptionIntoGenerator
[Wed Oct 20 08:55:14.843660 2021] [wsgi:error] [pid 2291806:tid 139906821093120] [remote 91.204.188.11:55294] return g.throw(self.type, self.value, self.tb)
[Wed Oct 20 08:55:14.843664 2021] [wsgi:error] [pid 2291806:tid 139906821093120] [remote 91.204.188.11:55294] File "/var/www/rocket/venv/lib/python3.6/site-packages/scrapy/core/downloader/middleware.py", line 44, in process_request
[Wed Oct 20 08:55:14.843669 2021] [wsgi:error] [pid 2291806:tid 139906821093120] [remote 91.204.188.11:55294] return (yield download_func(request=request, spider=spider))
[Wed Oct 20 08:55:14.843672 2021] [wsgi:error] [pid 2291806:tid 139906821093120] [remote 91.204.188.11:55294] File "/var/www/rocket/venv/lib/python3.6/site-packages/scrapy/utils/defer.py", line 55, in mustbe_deferred
[Wed Oct 20 08:55:14.843675 2021] [wsgi:error] [pid 2291806:tid 139906821093120] [remote 91.204.188.11:55294] result = f(*args, **kw)
[Wed Oct 20 08:55:14.843677 2021] [wsgi:error] [pid 2291806:tid 139906821093120] [remote 91.204.188.11:55294] File "/var/www/rocket/venv/lib/python3.6/site-packages/scrapy/core/downloader/handlers/__init__.py", line 75, in download_request
[Wed Oct 20 08:55:14.843680 2021] [wsgi:error] [pid 2291806:tid 139906821093120] [remote 91.204.188.11:55294] return handler.download_request(request, spider)
[Wed Oct 20 08:55:14.843683 2021] [wsgi:error] [pid 2291806:tid 139906821093120] [remote 91.204.188.11:55294] File "/var/www/rocket/venv/lib/python3.6/site-packages/scrapy/core/downloader/handlers/http11.py", line 65, in download_request
[Wed Oct 20 08:55:14.843686 2021] [wsgi:error] [pid 2291806:tid 139906821093120] [remote 91.204.188.11:55294] return agent.download_request(request)
[Wed Oct 20 08:55:14.843689 2021] [wsgi:error] [pid 2291806:tid 139906821093120] [remote 91.204.188.11:55294] File "/var/www/rocket/venv/lib/python3.6/site-packages/scrapy/core/downloader/handlers/http11.py", line 335, in download_request
[Wed Oct 20 08:55:14.843692 2021] [wsgi:error] [pid 2291806:tid 139906821093120] [remote 91.204.188.11:55294] d = agent.request(method, to_bytes(url, encoding='ascii'), headers, bodyproducer)
[Wed Oct 20 08:55:14.843695 2021] [wsgi:error] [pid 2291806:tid 139906821093120] [remote 91.204.188.11:55294] File "/var/www/rocket/venv/lib/python3.6/site-packages/twisted/web/client.py", line 1753, in request
[Wed Oct 20 08:55:14.843698 2021] [wsgi:error] [pid 2291806:tid 139906821093120] [remote 91.204.188.11:55294] endpoint = self._getEndpoint(parsedURI)
[Wed Oct 20 08:55:14.843700 2021] [wsgi:error] [pid 2291806:tid 139906821093120] [remote 91.204.188.11:55294] File "/var/www/rocket/venv/lib/python3.6/site-packages/twisted/web/client.py", line 1737, in _getEndpoint
[Wed Oct 20 08:55:14.843709 2021] [wsgi:error] [pid 2291806:tid 139906821093120] [remote 91.204.188.11:55294] return self._endpointFactory.endpointForURI(uri)
[Wed Oct 20 08:55:14.843712 2021] [wsgi:error] [pid 2291806:tid 139906821093120] [remote 91.204.188.11:55294] File "/var/www/rocket/venv/lib/python3.6/site-packages/twisted/web/client.py", line 1609, in endpointForURI
[Wed Oct 20 08:55:14.843715 2021] [wsgi:error] [pid 2291806:tid 139906821093120] [remote 91.204.188.11:55294] uri.host, uri.port
[Wed Oct 20 08:55:14.843718 2021] [wsgi:error] [pid 2291806:tid 139906821093120] [remote 91.204.188.11:55294] File "/var/www/rocket/venv/lib/python3.6/site-packages/scrapy/core/downloader/contextfactory.py", line 67, in creatorForNetloc
[Wed Oct 20 08:55:14.843720 2021] [wsgi:error] [pid 2291806:tid 139906821093120] [remote 91.204.188.11:55294] return ScrapyClientTLSOptions(hostname.decode("ascii"), self.getContext(),
[Wed Oct 20 08:55:14.843723 2021] [wsgi:error] [pid 2291806:tid 139906821093120] [remote 91.204.188.11:55294] File "/var/www/rocket/venv/lib/python3.6/site-packages/scrapy/core/downloader/contextfactory.py", line 64, in getContext
[Wed Oct 20 08:55:14.843726 2021] [wsgi:error] [pid 2291806:tid 139906821093120] [remote 91.204.188.11:55294] return self.getCertificateOptions().getContext()
[Wed Oct 20 08:55:14.843729 2021] [wsgi:error] [pid 2291806:tid 139906821093120] [remote 91.204.188.11:55294] File "/var/www/rocket/venv/lib/python3.6/site-packages/twisted/internet/_sslverify.py", line 1632, in getContext
[Wed Oct 20 08:55:14.843731 2021] [wsgi:error] [pid 2291806:tid 139906821093120] [remote 91.204.188.11:55294] self._context = self._makeContext()
[Wed Oct 20 08:55:14.843734 2021] [wsgi:error] [pid 2291806:tid 139906821093120] [remote 91.204.188.11:55294] File "/var/www/rocket/venv/lib/python3.6/site-packages/twisted/internet/_sslverify.py", line 1663, in _makeContext
[Wed Oct 20 08:55:14.843737 2021] [wsgi:error] [pid 2291806:tid 139906821093120] [remote 91.204.188.11:55294] ctx.set_verify(verifyFlags, _verifyCallback)
[Wed Oct 20 08:55:14.843739 2021] [wsgi:error] [pid 2291806:tid 139906821093120] [remote 91.204.188.11:55294] File "/var/www/rocket/venv/lib/python3.6/site-packages/OpenSSL/SSL.py", line 1084, in set_verify
[Wed Oct 20 08:55:14.843742 2021] [wsgi:error] [pid 2291806:tid 139906821093120] [remote 91.204.188.11:55294] self._verify_helper = _VerifyHelper(callback)
[Wed Oct 20 08:55:14.843745 2021] [wsgi:error] [pid 2291806:tid 139906821093120] [remote 91.204.188.11:55294] File "/var/www/rocket/venv/lib/python3.6/site-packages/OpenSSL/SSL.py", line 358, in __init__
[Wed Oct 20 08:55:14.843747 2021] [wsgi:error] [pid 2291806:tid 139906821093120] [remote 91.204.188.11:55294] "int (*)(int, X509_STORE_CTX *)", wrapper
[Wed Oct 20 08:55:14.843751 2021] [wsgi:error] [pid 2291806:tid 139906821093120] [remote 91.204.188.11:55294] SystemError: <built-in method callback of CompiledFFI object at 0x7f3e83f31b28> returned NULL without setting an error
下面是我如何调用 scraper 的源代码。
def _start:
process = CrawlerProcess()
process.crawl(scanengine.SiteScanSpider, urls = [scan.project.url])
process.start()
我将爬虫作为独立于 Django 视图的进程运行。
p = Process(target=_start,args=(scan_id,))
p.start()
【问题讨论】:
-
请编辑您的问题以添加相关源代码。
-
谢谢埃文。我也添加了源代码。
标签: python django scrapy openssl centos