【问题标题】:failing to scrape links from google chrome无法从谷歌浏览器中抓取链接
【发布时间】:2021-08-24 17:10:57
【问题描述】:

我的代码正在打开选项卡,搜索主题并关闭,但它没有向我发送它应该收集的链接。

from selenium import webdriver

pesquisa = input ("o que você quer pesquisar: ")


def get_results(search_term):
    url = "https://www.startpage.com"
    driver = webdriver.Chrome()
    driver.get(url)
    search_box = driver.find_element_by_id("q")
    search_box.send_keys(search_term)
    search_box.submit()
    try:
        links = driver.find_elements_by_xpath("//ol[@class='web_regular_results']//div//a")
    except:
        links = driver.find_elements_by_xpath("//div//a")
    results = []
    for link in links:
        href = link.get_attribute("href")
        print(href)
        results.append(href)
    driver.close()
    return results
    
get_results(pesquisa)

【问题讨论】:

    标签: python html hyperlink scrape


    【解决方案1】:
    from selenium import webdriver
    
    pesquisa = input ("o que você quer pesquisar: ")
    
    
    def get_results(search_term):
        url = "https://www.startpage.com"
        driver = webdriver.Chrome()
        driver.get(url)
        search_box = driver.find_element_by_id("q")
        search_box.send_keys(search_term)
        search_box.submit()
        try:
            links = driver.find_elements_by_xpath("//ol[@class='web_regular_results']//div//a")
        except:
            links = driver.find_elements_by_xpath("//div//a")
            pass            # Here you have to write pass to make following statements run
        results = []
        for link in links:
            href = link.get_attribute("href")
            print(href)
            results.append(href)
        driver.close()
        return results
        
    get_results(pesquisa)
    

    在 try-except 中,需要通过 after 异常块来运行以下语句。如果没有,它们将被取消。

    【讨论】:

      猜你喜欢
      • 2020-02-22
      • 2018-12-07
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多