【问题标题】:How to get contents for elements that have the same class如何获取具有相同类的元素的内容
【发布时间】:2020-12-23 06:22:19
【问题描述】:
我正在尝试使用硒提取产品信息。这是页面的 URL https://www.dell.com/en-us/shop/dell-laptops/sr/laptops/11th-gen-intel-core?appliedRefinements=23775
首先,我得到了我试图抓取的元素的父类,它们是计算机模型、CPU 等,它们被包含在卡片中
卡片“stack-system ps-stack”的父类,但是当我尝试在类中查找元素列表时,它是空的。
driver = webdriver.Chrome()
url = "https://www.dell.com/en-us/shop/dell-laptops/sr/laptops/11th-gen-intel-core?appliedRefinements=23775"
classname_main = "stack-system ps-stack"
driver.get(url)
driver.implicitly_wait(50)
products = driver.find_elements_by_class_name("stack-system ps-stack")
print(products)
我也想获取卡片的内容。
【问题讨论】:
标签:
python
selenium
xpath
css-selectors
webdriverwait
【解决方案1】:
例如,定位器 class_name 不接受空格或多类。改为使用 css_selector:
driver.find_elements_by_css_selector(".stack-system.ps-stack")
【解决方案2】:
提取产品名称,例如新 Inspiron 14 5000 笔记本电脑等使用 Selenium 和 python 您可以使用以下任一 Locator Strategies:
-
使用css_selector 和get_attribute("innerHTML"):
print([my_elem.get_attribute("innerHTML") for my_elem in driver.find_elements_by_css_selector("li.Fruit")])
-
使用xpath和text属性:
print([my_elem.text for my_elem in driver.find_elements_by_xpath("//li[@class='Fruit']")])
理想情况下,您需要为visibility_of_all_elements_located() 诱导WebDriverWait,您可以使用以下任一Locator Strategies:
-
使用CSS_SELECTOR 和get_attribute("innerHTML"):
print([my_elem.get_attribute("innerHTML") for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "article.stack-system.ps-stack h3 > a")))])
-
使用XPATH和text属性:
print([my_elem.text for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//article[@class='stack-system ps-stack']//h3/a")))])
-
控制台输出:
['New Inspiron 14 5000 Laptop', 'New Inspiron 14 5000 2-in-1 Laptop (Dune)', 'New Inspiron 15 5000 Laptop', 'New Inspiron 14 5000 Laptop', 'New Inspiron 14 5000 Laptop', 'New Inspiron 15 5000 Laptop', 'New Inspiron 14 5000 2-in-1 Laptop (Dune)', 'New Inspiron 14 5000 2-in-1 Laptop (Titan Grey)', 'New Inspiron 14 5000 2-in-1 Laptop (Titan Grey)', 'New Inspiron 13 7000 2-in-1 Laptop', 'New Inspiron 15 5000 Laptop', 'New Inspiron 15 7000 2-in-1 Laptop']
-
注意:您必须添加以下导入:
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
结尾
链接到有用的文档:
【解决方案3】:
使用这个 css 选择器查找所有文本。
driver.get(url)
print(WebDriverWait(driver, 10).until(EC.visibility_of_element_located((By.CSS_SELECTOR,"div.no-div-lines-layout"))).text)