如果需要太长时间，则在 for 循环中跳过 selenium Webdriver.get() 调用答案

【问题标题】：Skip selenium Webdriver.get() call inside for loop if it takes too long如果需要太长时间，则在 for 循环中跳过 selenium Webdriver.get() 调用
【发布时间】：2020-10-26 10:46:46
【问题描述】：

大家好，我无法理解如何向 for in range 循环添加异常。现在，我正在从 Excel 工作表中提取 URL 并在整个页面中移动时抓取信息，直到到达第 200 页。问题是并非所有 URL 的页面都达到 200，因此循环结束需要很长时间，并且程序可以继续使用另一个 URL。有没有办法在这里的代码中实现异常？

from selenium import webdriver
import pandas as pd
import time

driver = webdriver.Chrome("C:/Users/Acer/Desktop/chromedriver.exe")

companies = []

df = pd.read_excel('C:/Users/Acer/Desktop/urls.xlsx')

for index, row in df.iterrows():
    base_url = (row['urls'])
    
    for i in range(1,201,1):
        
        url = "{base_url}?curpage={i}".format(base_url=base_url, i=i)
        driver.get(url)
        time.sleep(2)
        
        name = driver.find_elements_by_xpath('//a/div/div/p')
    
        for names in name:
            print(names.text, url)
            companies.append([names.text, url])

【问题讨论】：

使用关键字continue，您可以跳过当前循环循环并从下一个循环开始。也许这可以帮助您解决问题

标签： python selenium loops

【解决方案1】：

您可以在 Webdriver 上set a max timeout，然后在循环中观察Timeout 异常：

from selenium.common.exceptions import TimeoutException

MAX_TIMEOUT_SECONDS = 5

driver = webdriver.Chrome("C:/Users/Acer/Desktop/chromedriver.exe")
driver.set_page_load_timeout(MAX_TIMEOUT_SECONDS)

for i in range(1, 201):
    try:
        url = "{base_url}?curpage={i}".format(base_url=base_url, i=i)
        driver.get(url)
    except TimeoutException:
        # skip this if it takes more than 5 seconds
        continue
    ... # process the scraped URL as usual

如果发生超时，则通过continue 跳过当前迭代。

【讨论】：