【问题标题】:Retreive Xpath for google search links检索 Xpath 以获取 google 搜索链接
【发布时间】:2021-03-31 16:17:43
【问题描述】:

我正在编写一个 python selenium 脚本来尝试在 google 搜索中提取 LinkedIn 个人资料的 URL 链接,但我在缩小我的 XPath 以仅返回 google 上的搜索结果链接时遇到问题。

linkedin_urls = driver.find_elements_by_xpath('//div[@class="yuRUbf"]//a[@href]')
for linkedin_url in linkedin_urls:
    url = linkedin_url.get_attribute("href")
    print(url)

    driver.get(url)
    sleep(5)

linkedin_urls 给我的结果

https://uk.linkedin.com/in/roxana-andreea-popescu
https://uk.linkedin.com/in/tunjijabitta
https://www.google.com/search?source=hp&ei=bxjhX4uGC4_ykgXl9pu4Bw&q=site%3Alinkedin.com%2Fin%2F+AND+%22Software+Developer%22+AND+%22London%22&oq=site%3Alinkedin.com%2Fin%2F+AND+%22Software+Developer%22+AND+%22London%22&gs_lcp=CgZwc3ktYWIQDFDMZFjhZmCwZ2gAcAB4AIABLogBsAGSAQE0mAEAoAEBqgEHZ3dzLXdpeg&sclient=psy-ab&ved=0ahUKEwjL-dn4huDtAhUPuaQKHWX7BncQ4dUDCA0#
https://www.google.com/search?q=related:https://uk.linkedin.com/in/tunjijabitta&sa=X&ved=2ahUKEwji3qP_huDtAhWAZxUIHTyfAO4QHzABegQIBhAH
https://uk.linkedin.com/in/janomer
https://uk.linkedin.com/in/josephcoker
https://uk.linkedin.com/in/sebemin
https://uk.linkedin.com/in/vicki-marshall-b7433827
https://www.google.com/search?source=hp&ei=bxjhX4uGC4_ykgXl9pu4Bw&q=site%3Alinkedin.com%2Fin%2F+AND+%22Software+Developer%22+AND+%22London%22&oq=site%3Alinkedin.com%2Fin%2F+AND+%22Software+Developer%22+AND+%22London%22&gs_lcp=CgZwc3ktYWIQDFDMZFjhZmCwZ2gAcAB4AIABLogBsAGSAQE0mAEAoAEBqgEHZ3dzLXdpeg&sclient=psy-ab&ved=0ahUKEwjL-dn4huDtAhUPuaQKHWX7BncQ4dUDCA0#
https://www.google.com/search?q=related:https://uk.linkedin.com/in/vicki-marshall-b7433827&sa=X&ved=2ahUKEwji3qP_huDtAhWAZxUIHTyfAO4QHzAFegQIARAH
https://uk.linkedin.com/in/andreibodnar
https://www.google.com/search?q=related:https://uk.linkedin.com/in/andreibodnar&sa=X&ved=2ahUKEwji3qP_huDtAhWAZxUIHTyfAO4QHzAGegQIBxAH
https://uk.linkedin.com/in/dmrlawson
https://uk.linkedin.com/in/jack-gilbert-541a251b
https://www.google.com/search?source=hp&ei=bxjhX4uGC4_ykgXl9pu4Bw&q=site%3Alinkedin.com%2Fin%2F+AND+%22Software+Developer%22+AND+%22London%22&oq=site%3Alinkedin.com%2Fin%2F+AND+%22Software+Developer%22+AND+%22London%22&gs_lcp=CgZwc3ktYWIQDFDMZFjhZmCwZ2gAcAB4AIABLogBsAGSAQE0mAEAoAEBqgEHZ3dzLXdpeg&sclient=psy-ab&ved=0ahUKEwjL-dn4huDtAhUPuaQKHWX7BncQ4dUDCA0#
https://www.google.com/search?q=related:https://uk.linkedin.com/in/jack-gilbert-541a251b&sa=X&ved=2ahUKEwji3qP_huDtAhWAZxUIHTyfAO4QHzAIegQICxAH
https://uk.linkedin.com/in/eren-batu-999068185

我正在尝试找到一种方法将搜索范围缩小到仅 LinkedIn 结果

【问题讨论】:

  • 尝试使用 contains linkedin //a[contains(@href,'linkedin')] .

标签: python selenium google-chrome xpath automation


【解决方案1】:

如果您只想获得LinkedIn 的结果,请使用以下 xpath。

使用contains()

linkedin_urls = driver.find_elements_by_xpath('//div[@class="yuRUbf"]//a[contains(@href,"https://uk.linkedin.com")]')

starts-with()

使用

linkedin_urls = driver.find_elements_by_xpath('//div[@class="yuRUbf"]//a[starts-with(@href,"https://uk.linkedin.com")]')

【讨论】:

  • 嘿@Kunduk,我尝试使用starts-with(),但在脚本获取第一个LinkedIn URL 后出现错误。 linkedin_urls = driver.find_elements_by_xpath('//div[@class="yuRUbf"]//a[starts-with(@href,"https://uk.linkedin.com")]') print(linkedin_urls) sleep(0.5) for linkedin_url in linkedin_urls: url = linkedin_url.get_attribute("href") print(url) driver.get(url) sleep(5) sel = Selector(text=driver.page_source)
  • 错误是:回溯(最近一次调用最后一次):文件“c:\Users\emman\Documents\Final_Year_Project\LinkedinWebDriver.py”,第 50 行,在 url = linkedin_url.get_attribute ("href") 文件“C:\Users\emman\AppData\Local\Programs\Python\Python39\lib\site-packages\selenium\webdriver\remote\webelement.py”,第 139 行,在 get_attribute attributeValue = self. parent.execute_script(文件“C:\Users\emman\AppData\Local\Programs\Python\Python39\lib\site-packages\selenium\webdriver\remote\webdriver.py”,第 634 行,在 execute_script r
【解决方案2】:

您想解析linkedin_url 中的每个字符串,看看它是否提到了Linkedin。

    if 'linkedin' in linkedin_url:
        print('linkedin')

基本上,把你想在Linkedin上执行的驱动代码放在if语句下面。

【讨论】:

    【解决方案3】:

    要将搜索限制在您需要为visibility_of_all_elements_located() 引入WebDriverWait 的LinkedIn 结果,您可以使用以下任一Locator Strategies

    • 使用CSS_SELECTOR

      print([my_elem.get_attribute("href") for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "div.yuRUbf a[href^='https://uk.linkedin.com/in']")))])
      
    • 使用XPATH

      print([my_elem.get_attribute("href") for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//div[@class="yuRUbf"]//a[starts-with(@href, 'https://uk.linkedin.com/in')]")))])
      
    • 注意:您必须添加以下导入:

      from selenium.webdriver.support.ui import WebDriverWait
      from selenium.webdriver.common.by import By
      from selenium.webdriver.support import expected_conditions as EC
      

    【讨论】:

      猜你喜欢
      • 2021-01-25
      • 2012-12-24
      • 2018-11-15
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2011-03-27
      • 1970-01-01
      相关资源
      最近更新 更多