脚本无法从重定向的 url 解析标题答案

【问题标题】：Script can't parse title from a redirected url脚本无法从重定向的 url 解析标题
【发布时间】：2018-11-02 17:07:25
【问题描述】：

我用 selenium 在 python 中编写了一个脚本来从网页获取标题地址。我在脚本中使用的 url 会在几秒钟内自动重定向。这是我的脚本遇到错误的地方。我正在粘贴该错误的一部分，以便为您提供一个想法。

ConnectionResetError: [WinError 10054] An existing connection was forcibly closed by the remote host

During handling of the above exception, another exception occurred:

Link to that url which gets redirected to another page

我尝试过的脚本：

from contextlib import closing
from selenium import webdriver
from selenium.webdriver.support import ui

url = "https://www.rightmove.co.uk/propertyMedia/redirect.html?propertyId=30578943&contentId=1625965454&index=1"

with closing(webdriver.Chrome()) as wd:
    wait = ui.WebDriverWait(wd, 10)
    wd.get(url)
    item = wait.until(lambda driver: driver.find_element_by_css_selector("h1.header_address__title")).text
    print(item)

这是我希望从该页面获得的输出：

Park View Back Road, Locharbriggs, Dumfries, DG1

这是我在该错误之前看到的：

【问题讨论】：

标签： python python-3.x selenium selenium-webdriver web-scraping

【解决方案1】：

你可能需要更换

item = wait.until(lambda driver: driver.find_element_by_css_selector("h1.header_address__title")).text

这意味着等待特定元素出现在 DOM 中并立即获取其当前可见的文本（可能返回空字符串）

与

item = wait.until(lambda driver: driver.find_element_by_css_selector("h1.header_address__title").text)

这意味着等待特定元素并在它不是空字符串时返回其可见文本

但是恕我直言，您可以简单地做

item = driver.find_element_by_css_selector("h1.header_address__title").get_attribute('textContent')

获取文本值，即使该文本当前未显示在页面上

至于您的chromedriver that stops working 问题：尝试将Chrome 和chromedriver 更新到最新版本

【讨论】：

现在它给了我一个非常友好的异常错误@sir Andersson。您更新 chromedriver 的建议使我摆脱了那个恶性问题（您在上图中看到的那个）。我现在遇到的第一个错误selector 是raise TimeoutException(message, screen, stacktrace) selenium.common.exceptions.TimeoutException: Message: 。对于第二个建议：` raise exception_class(message, screen, stacktrace) selenium.common.exceptions.NoSuchElementException: Message: no such element: Unable to locate element: {"method":"css selector","selector":"h1. header_address__title"}`
@asmitu ，哦...对不起 :) 我没有检查元素 - 只是使用了你的选择器...让我现在检查一下
@asmitu ，是的。一些评论:)第一个不起作用，因为似乎在浏览器窗口未最大化时，标题不可见，所以宁可在脚本开头使用wd.maximize_window()或item = wait.until(lambda driver: driver.find_element_by_css_selector("h1.header_address__title").get_attribute('textContent'))（driver.find_element_by_css_selector("h1.header_address__title").get_attribute('textContent')无需等待不是工作，因为它没有找到，因为初始页面不是我们的目标页面）
顺便说一句，在更新我的 chromedriver 之前，我在其他网站上取得了成功，但在这个网站上遇到了上述错误。这种神秘的行为有什么原因吗？如果您不喜欢回答这部分，请随意忽略。再次感谢先生。
@asmitu ，实际上我没有做任何研究，所以我不知道为什么会发生这种情况。我只知道如果出现任何操作系统错误，首先要做的是检查浏览器-webDriver 兼容性:)