检索具有自定义 HTML 属性的元素答案

【问题标题】：Retrieving elements with custom HTML attributes检索具有自定义 HTML 属性的元素
【发布时间】：2019-08-12 05:23:04
【问题描述】：

我有以下网站：https://www.kvk.nl/handelsregister/publicaties/，我想用 Selenium、Scrapy 和 Python 检索登录链接。所以对于相关的功能，我有如下代码：

def start_requests(self):
        self.driver = webdriver.Chrome(executable_path=os.path.join(os.getcwd(), "Drivers", "chromedriver.exe"))
        self.driver.get(self.initial_url)
        test = access_page_wait.until(expected_conditions.visibility_of_element_located((By.CSS_SELECTOR, 'a[data-ui-test-class="linkCard_toegangscode"]')))
    if test.is_displayed():
        print("+1")
    else:
        print("-1")

但是，这似乎不起作用，因为它只是等待 15 秒然后停止。它永远不会达到 +1 或 -1。

现在我的问题是，我们如何将硒指向正确的元素。使用 XPATH find_elements_by_xpath("//a[@data-ui-test-class='linkCard_toegangscode']") 似乎也不起作用。

我应该使用另一种选择方法吗？如果可以，是哪一种？

【问题讨论】：

你试过我的答案了吗？
考虑使用 github.com/clemfromspace/scrapy-selenium 混合 Scrapy 和 Selenium，以防止将来可能遇到的其他问题。
我其实只希望 Selenium 让我登录并获取身份验证背后的页面。将登录身份验证标头 /session 传递给我的 Scrapy 蜘蛛，然后我继续抓取。我相信 Scrapy 更快一些，因为它不需要浏览器。

标签： python-3.x selenium xpath scrapy css-selectors

【解决方案1】：

因为有 Frame 阻止你访问元素。Switch_To iframe 然后访问元素。

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions
import os
driver = webdriver.Chrome(executable_path=os.path.join(os.getcwd(), "Drivers", "chromedriver.exe"))
driver.get("https://www.kvk.nl/handelsregister/publicaties/")
driver.switch_to.frame(0)
test=WebDriverWait(driver,10).until(expected_conditions.visibility_of_element_located((By.CSS_SELECTOR, 'a[data-ui-test-class="linkCard_toegangscode"]')))
if test.is_displayed():
    print("+1")
else:
    print("-1")

试试上面的代码。它应该会打印出你正在看的东西。

【讨论】：