Python Selenium - 调整 pause_time 以在无限页面中向下滚动答案

【问题标题】：Python Selenium - Adjust pause_time to scroll down in infinite pagePython Selenium - 调整 pause_time 以在无限页面中向下滚动
【发布时间】：2018-09-23 12:19:49
【问题描述】：

我正在尝试抓取无限页面中的所有可用链接，向下滚动并获取可用的新链接。但是， time.sleep() 不允许在一次又一次向下滚动之前暂停驱动程序一段合理的时间。

有什么方法可以调整您可以在底部找到的代码，以减少第一次迭代期间的睡眠次数（当页面仍然快速加载新内容时）并等待下一次迭代的必要时间（什么时候页面会慢慢加载新内容）？

使用简单

for i in range(1,20):
    time.sleep(i)

不会让我在第一次迭代中节省时间，也不会在多次迭代后有效地调整 time.sleep()。

这是我根据“How can I scroll a web page using selenium webdriver in python?”中的建议使用的代码：

from selenium import webdriver

scroll_pause_time = 5
scraped_links = []

driver = webdriver.Chrome(executable_path=driver_path)
driver.get(url)
links = driver.find_elements_by_xpath(links_filepath)
for link in links:
    if link not in scraped_links:
        scraped_links.append(link)
        print(link)
last_height = driver.execute_script("return document.body.scrollHeight")
while True:
    driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
    time.sleep(scroll_pause_time)
    new_height = driver.execute_script("return document.body.scrollHeight")
    if new_height == last_height:
        break
    last_height = new_height
    links = driver.find_elements_by_xpath(links_filepath)
    for link in links:
        if link not in scraped_links:
            scraped_links.append(link)
            print(link)

在 20-30 次迭代后代码中断，因为 time.sleep() 与网页的刷新速度相比太低了。

【问题讨论】：

标签： python selenium scroll sleep sleep-mode

【解决方案1】：

如果您不想每次都猜测加载页面需要多长时间并设置一些随机秒数进入睡眠状态，您可以使用Explicit Waits。示例：

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

driver = webdriver.Firefox()
driver.get("http://somedomain/url_that_delays_loading")
try:
    element = WebDriverWait(browser, 10).until(
                                    EC.presence_of_element_located((By.ID, "myDynamicElement"))
                                )
except common.exceptions.TimeoutException:
    print('TimeoutException')
finally:
    driver.quit()

# do what you want after necessary elements are loaded

这将解决 time.sleep() 相对于网页刷新速度过低的问题。

【讨论】：