【问题标题】:Selenium (python): retrieving both href and text of an anchorSelenium(python):检索锚的href和文本
【发布时间】:2022-01-25 20:11:10
【问题描述】:

以下工作代码显示我能够从 Web 元素中检索文本,但不能从 href 中检索文本(返回无)。我究竟做错了什么?没有按预期工作的代码行是倒数第二行:

from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.keys import Keys

driver = webdriver.Chrome(
    "/Users/bob/Documents/work/AIFA/scraper/scrape_gu/chromedriver"
)
wait = WebDriverWait(driver, 30)

driver.get("https://farmaci.agenziafarmaco.gov.it/bancadatifarmaci/cerca-farmaco")
readunderstood = driver.find_element_by_id("conf")
readunderstood.click()
accept = WebDriverWait(driver, 10).until(
    EC.element_to_be_clickable((By.XPATH, "/html/body/div[5]/div[3]/div/button"))
)
accept.click()
# end of the initial agreement screens and general preparation
##############################################################
SEARCH_STRING = "AB"  # we can safely assume this does not exist

find_textbox = driver.find_element_by_id("search")
find_textbox.clear()  # after the first search the old value will still be there
find_textbox.send_keys(SEARCH_STRING)
find_textbox.send_keys(Keys.ENTER)
# end of the search for a drug action
##############################################################
drugs_list = wait.until(
    EC.presence_of_all_elements_located(
        (By.XPATH, "//*[@id='ul_farm_results']/li[@style='display: list-item;']",)
    )
)
###### this is the part I don't understand
for drug in drugs_list:
    print(drug.get_attribute("href"))  # this should return a link, but returns None
    print(drug.text)  # this correctly prints 3 lines per drug

【问题讨论】:

    标签: python selenium web-scraping xpath


    【解决方案1】:

    href 链接不包含在您通过定位器获取的元素中,而是包含在它们的子元素 a 中。
    因此,要使您的代码按预期工作,您只需调整定位器即可。
    请试试这个:

    drugs_list = wait.until(EC.presence_of_all_elements_located((By.XPATH, "//*[@id='ul_farm_results']/li[@style='display: list-item;']/a")))
    
    for drug in drugs_list:
        print(drug.get_attribute("href")) 
        print(drug.text)  
    

    我还建议您使用visibility_of_element_located 预期条件而不是presence_of_all_elements_located,因为visibility_of_element_located 将等待更成熟的元素状态,不仅是元素的存在(虽然它们可能还没有完全呈现)而且元素已完全渲染且可见。
    在这种情况下,您的代码可能如下所示:

    the_xpath = "//*[@id='ul_farm_results']/li[@style='display: list-item;']/a"
    wait.until(EC.visibility_of_element_located((By.XPATH, the_xpath)))
    drugs_list = driver.find_elements(By.XPATH, the_xpath)
    for drug in drugs_list:
        print(drug.get_attribute("href")) 
        print(drug.text)  
    

    【讨论】:

    • 理解这种奥术练习的道路充满了陷阱,但感谢像你这样的绝地大师,我将获胜:)。谢谢
    • 不,我远不是绝地大师 :) 我很高兴能帮助你。
    猜你喜欢
    • 2016-11-03
    • 2021-11-19
    • 1970-01-01
    • 2021-09-08
    • 1970-01-01
    • 2023-03-06
    • 2016-10-12
    • 2019-08-09
    • 1970-01-01
    相关资源
    最近更新 更多