Python3 - Selenium 无法找到提供的 xpath答案

【问题标题】：Python3 - Selenium unable to find xpath providedPython3 - Selenium 无法找到提供的 xpath
【发布时间】：2020-06-05 10:41:24
【问题描述】：

我正在使用 Python 3 和 Selenium 从以下网站获取一些图片链接：

import sys
import os
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.proxy import Proxy, ProxyType

chrome_options = Options()  
chrome_options.add_argument("--headless")

driver = webdriver.Chrome(chrome_options=chrome_options)
driver.get('https://www.sky.com/tv-guide/20200605/4101-1/107/Efe2-364')

link_xpath = '/html/body/main/div/div[2]/div[2]/div/div/div[2]/div/div[2]/div[1]/div/div/div[2]/div/img'

link_path = driver.find_element_by_xpath(link_xpath).text
print(link_path)

driver.quit()

解析此 URL 时，您可以在页面中间看到有问题的图像。当您在 Google Chrome 中右键单击并检查元素时，您可以在 Chrome 开发工具中右键单击元素本身并获取此图像的 xpath。

所有看起来都是为了我，但是在运行上面的代码时，我收到以下错误：

Traceback (most recent call last):
  File "G:\folder\folder\testfilepy", line 16, in <module>
    link_path = driver.find_element_by_xpath(link_xpath).text
  File "G:\Python36\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 394, in find_element_by_xpath
    return self.find_element(by=By.XPATH, value=xpath)
  File "G:\Python36\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 978, in find_element
    'value': value})['value']
  File "G:\Python36\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 321, in execute
    self.error_handler.check_response(response)
  File "G:\Python36\lib\site-packages\selenium\webdriver\remote\errorhandler.py", line 242, in check_response
    raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.NoSuchElementException: Message: no such element: Unable to locate element: {"method":"xpath","selector":"/html/body/main/div/div[2]/div[2]/div/div/div[2]/div/div[2]/div[1]/div/div/div[2]/div/img"}
  (Session info: headless chrome=83.0.4103.61)

谁能告诉我为什么 Selenium 找不到提供的 xpath？

【问题讨论】：

试试这个link_xpath = '//div[@class="c-bezel programme-content__image"]//img'，但实际上元素没有返回文本，你要实现什么，属性是什么？
嗨 - 在检查元素时，我看到一个指向图像的 http 链接：images.metadata.sky.com/pd-image/… ....我想基本上抓住那个链接

标签： python selenium xpath css-selectors webdriverwait

【解决方案1】：

要提取图片的src属性，需要为visibility_of_element_located()诱导WebDriverWait，可以使用以下Locator Strategies之一：

使用CSS_SELECTOR：

options = webdriver.ChromeOptions() 
options.add_experimental_option("excludeSwitches", ["enable-automation"])
options.add_experimental_option('useAutomationExtension', False)
options.add_argument('--headless')
options.add_argument('--window-size=1920,1080')
driver = webdriver.Chrome(options=options, executable_path=r'C:\WebDrivers\chromedriver.exe')
driver.get('https://www.sky.com/tv-guide/20200605/4101-1/107/Efe2-364')
print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "div.o-layout__item div.c-bezel.programme-content__image>img"))).get_attribute("src"))

使用XPATH：

options = webdriver.ChromeOptions() 
options.add_experimental_option("excludeSwitches", ["enable-automation"])
options.add_experimental_option('useAutomationExtension', False)
options.add_argument('--headless')
options.add_argument('--window-size=1920,1080')
driver = webdriver.Chrome(options=options, executable_path=r'C:\WebDrivers\chromedriver.exe')
driver.get('https://www.sky.com/tv-guide/20200605/4101-1/107/Efe2-364')     
print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//div[@class='o-layout__item']//div[@class='c-bezel programme-content__image']/img"))).get_attribute("src"))

控制台输出：

https://images.metadata.sky.com/pd-image/251eeec2-acb3-4733-891b-60f10f2cc28c/16-9/640

注意：您必须添加以下导入：

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

参考

您可以在NoSuchElementException 中找到一些详细的讨论：

【讨论】：

【解决方案2】：

你有正确的xpath，但不要使用绝对路径，它很容易被改变。试试这个相对的xpath：//div[@class="c-bezel programme-content__image"]//img。

要实现你的意思，请使用.get_attribute("src") 而不是.text

driver.get('https://www.sky.com/tv-guide/20200605/4101-1/107/Efe2-364')
element = WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, '//div[@class="c-bezel programme-content__image"]//img')))
print(element.get_attribute("src"))
driver.quit()

或者更好的方法，使用 css 选择器。这应该更快：

element = WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, '.c-bezel.programme-content__image > img')))

参考：https://selenium-python.readthedocs.io/locating-elements.html

【讨论】：

嗨 - 我不知道你可以用这种方式解析 xpath，但我会确保我将来通过这种方法来做。谢谢。
@gdogg371 欢迎，仅供参考locating-elements-by-xpath 和locating-elements-by-css-selectors

【解决方案3】：

如果您在无头模式下工作，通常最好添加窗口大小。将此行添加到您的选项中：

chrome_options.add_argument('window-size=1920x1080')

【讨论】：

为什么？这个选项有什么作用？
...实际上，我可以看到这不再引发错误，虽然我不知道为什么...但是，它现在似乎返回一个空白字符串，因为我看不到任何文字完全返回...
当您指向 img 标签时，您将看不到任何文本。您可以手动检查 DOM 以确认文本是“”。
driver.find_element_by_xpath(link_xpath).get_attribute('src') 这样做。
老实说，这是我从经验中学到的。一些网站需要伪造窗口大小才能在无头模式下使用其 DOM。不幸的是，这并不明显，也没有明确记录。

【解决方案4】：

您的 xpath 似乎是正确的。您无法找到，因为您忘记处理 cookie。自己试试吧。将驱动程序暂停几秒钟，然后单击同意所有 cookie。然后你会看到你的元素。有多种方式来处理 cookie。我能够通过使用我自己的更清洁的 xpath 来定位 xpath。我从最近的父母那里访问那个元素。

希望对您有所帮助。

【讨论】：