如何解决 urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='127.0.0.1', port=58408): Max retries exceeded with url答案

【问题标题】：How to address urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='127.0.0.1', port=58408): Max retries exceeded with url如何解决 urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='127.0.0.1', port=58408): Max retries exceeded with url
【发布时间】：2021-02-21 00:33:30
【问题描述】：

我正在尝试用 selenium 抓取网站的几页并使用结果，但是当我运行该函数两次时

[WinError 10061] No connection could be made because the target machine actively refused it'

第二个函数调用出现错误。这是我的方法：

import os
import re
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.chrome.options import Options
from bs4 import BeautifulSoup as soup

opts = webdriver.ChromeOptions()
opts.binary_location = os.environ.get('GOOGLE_CHROME_BIN', None)
opts.add_argument("--headless")
opts.add_argument("--disable-dev-shm-usage")
opts.add_argument("--no-sandbox")
browser = webdriver.Chrome(executable_path="CHROME_DRIVER PATH", options=opts)

lst =[]
def search(st):
    for i in range(1,3):
        url = "https://gogoanime.so/anime-list.html?page=" + str(i)
        browser.get(url)
        req = browser.page_source
        sou = soup(req, "html.parser")
        title = sou.find('ul', class_ = "listing")
        title = title.find_all("li")
        for j in range(len(title)):
            lst.append(title[j].getText().lower()[1:])
    browser.quit()
    print(len(lst))
    
search("a")
search("a")

输出

272
raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='127.0.0.1', port=58408): Max retries exceeded with url: /session/4b3cb270d1b5b867257dcb1cee49b368/url (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x000001D5B378FA60>: Failed to establish a new connection: [WinError 10061] No connection could be made because the target machine actively refused it'))

【问题讨论】：

标签： selenium selenium-webdriver web-scraping beautifulsoup webdriver

【解决方案1】：

我在 Robot Framework 中遇到了同样的问题。

MaxRetryError: HTTPConnectionPool(host='options=add_argument("--ignore-certificate-errors")', port=80): Max retries exceeded with url: /session (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x000001ABA3190F10>: Failed to establish a new connection: [Errno 11001] getaddrinfo failed')).

在我将所有库更新到 Pycharm 中的最新版本并选择 Intellibot@SeleniumLibrary.patched 后，此问题得到解决

【讨论】：

【解决方案2】：

此错误消息...

raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='127.0.0.1', port=58408): Max retries exceeded with url: /session/4b3cb270d1b5b867257dcb1cee49b368/url (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x000001D5B378FA60>: Failed to establish a new connection: [WinError 10061] No connection could be made because the target machine actively refused it'))

...表示无法建立新连接引发 MaxRetryError 因为无法建立连接。

有几点：

首先，根据max-retries-exceeded exceptions are confusing 的讨论，traceback 有点误导。 Requests 包装了异常以方便用户使用。原始异常是显示的消息的一部分。
请求从不重试（它将retries=0 设置为urllib3 的HTTPConnectionPool），因此如果没有MaxRetryError 和HTTPConnectionPool 关键字，错误会更加规范。所以理想的 Traceback 应该是：
```
  ConnectionError(<class 'socket.error'>: [Errno 1111] Connection refused)
```

根本原因和解决方案

一旦您启动了 webdriver 和 web 客户端会话，接下来在 def search(st) 中您将调用 get() o 访问 url 并且在后续行中您还将调用 browser.quit() 这是用于调用 /shutdown 端点，随后 webdriver 和 web-client 实例被销毁，完全关闭所有页面/选项卡/窗口。因此不再存在连接。

您可以在以下位置找到一些相关的详细讨论：

PhantomJS web driver stays in memory

Selenium : How to stop geckodriver process impacting PC memory, without calling driver.quit()?

在下一次迭代中的这种情况下（由于for 循环），当调用browser.get() 时，没有活动连接。因此您会看到错误。

因此，一个简单的解决方案是删除browser.quit() 行并在同一浏览上下文中调用browser.get(url)。

结论

升级到 Selenium 3.14.1 后，您将能够设置超时并查看规范 Tracebacks 并能够采取必要的措施。

参考文献

您可以在以下位置找到相关的详细讨论：

MaxRetryError: HTTPConnectionPool: Max retries exceeded (Caused by ProtocolError('Connection aborted.', error(111, 'Connection refused')))

tl;博士

几个相关的讨论：

【讨论】：

这有所帮助，但 chromedriver 进程正在影响内存。我应该使用os.system("taskkill /f /im chromedriver.exe /T") 吗？我在 Windows 机器上。
@SumitJaiswal 简而言之，是的，您需要这样做，但还有几个其他因素需要考虑。如果您遇到困难，请告诉我。
在for 循环中杀死 chromedriver 是个坏主意。所以我为os.system("taskkill /f /im chromedriver.exe /T") 做了一个kill() 函数。但是调用search() 然后kill() 然后search() 再次给出相同的错误。有没有办法在我打电话时重新启动 chromedriver search()
我想我通过将browser = webdriver.Chrome(executable_path="CHROME_DRIVER PATH", options=opts) 带入函数并在其末尾使用browser.quit() 解决了这两个问题。感谢您的帮助。
惊人的答案，老实说，我只是错过了我的功能中的“回报”。但是很努力，写的很好。喜欢读它。