嵌套的 for-while 循环在第一次运行后停止迭代答案

【问题标题】：Nested for-while loop stops iteration after first run嵌套的 for-while 循环在第一次运行后停止迭代
【发布时间】：2019-11-16 22:25:53
【问题描述】：

平台：
Python 版本：3.7.3
硒版本：3.141.0
操作系统： Win7

问题：
我有一个 url 列表作为文本文件，每个 url 在单独的行上。网址是下载链接。我想遍历所有 url 并将链接到每个 url 的文件下载到特定文件夹中。

我尝试过的代码是一个嵌套的 for-while 循环。 第一次迭代没有任何问题，但第二次迭代卡在了一个 while 循环中。

显然有更好的方法来做我想做的事情。我只是 python 的初学者，并尽可能地学习这门语言。

My Url List:

https://mega.nz/#!bOgBWKiB!AWs3JSksW0mpZ8Eob0-Qpr5ZAG0N1zhoFBFVstNJfXs
https://mega.nz/#!qPxGAAYJ!BX-hv7jgE4qvBs_uhHPVpsLRm1Yl4JkZ17nI1-U6hvk
https://mega.nz/#!GPoiHaaT!TAKT4sOhIiMUSFFSmlvPOidMcscXzHH_8HgK27LyTRM

我尝试过的代码：

import os
from selenium import webdriver
from selenium.webdriver.common.by import By
from time import sleep
from pathlib import Path
from selenium.common.exceptions import NoSuchElementException
from selenium.webdriver.firefox.firefox_binary import FirefoxBinary
binary = FirefoxBinary('C:\\Program Files\\Mozilla Firefox\\firefox.exe')
fp = webdriver.FirefoxProfile()
fp.set_preference("browser.download.folderList", 2)
fp.set_preference("browser.download.manager.showWhenStarting", False)
fp.set_preference("browser.download.dir", "H:\\downloads")
fp.set_preference("browser.helperApps.neverAsk.saveToDisk", "application/zip")
driver = webdriver.Firefox(firefox_binary=binary, firefox_profile=fp, executable_path=r'C:\\Program Files\\Python\\Python37\\Lib\\site-packages\\selenium\\webdriver\\firefox\\geckodriver.exe')
driver.set_window_size(1600, 1050)
with open("H:\\downloads\\my_url_list.txt", "r") as f:
    for url in f:
        driver.get(url.strip())
        sleep(5)
        while True:
            # checks whether the element is available on the page, used 'while' instead of 'wait' as I couuldn't figure out the wait time.
            try:
                content = driver.find_element_by_css_selector('div.buttons-block:nth-child(1) > div:nth-child(2)')
                break
            except NoSuchElementException:
                continue
        # used 'execute_script' instead of 'click()' due to "scroll into view error"
        driver.execute_script("arguments[0].click();", content)
        sleep(5)
        while True:
            # checks whether 'filename' element is available on the page, the page shows multiple elements depending on interaction.
            if driver.find_element_by_xpath("/html/body/div[6]/div[3]/div/div[1]/div[4]/div[1]/div/span[1]"):
                filename = driver.find_element_by_xpath("/html/body/div[6]/div[3]/div/div[1]/div[4]/div[1]/div/span[1]").text
                break
            elif driver.find_element_by_xpath("/html/body/div[6]/div[3]/div/div[1]/div[5]/div/div/div[1]/div[1]/div[2]/div[3]/div[1]/span[1]"):
                filename = driver.find_element_by_xpath("/html/body/div[6]/div[3]/div/div[1]/div[5]/div/div/div[1]/div[1]/div[2]/div[3]/div[1]/span[1]").text
                break
            else:
                sleep(5)
        print(filename)
        dirname = 'H:\\downloads'
        suffix = '.zip'
        file_path = Path(dirname, filename).with_suffix(suffix)
        while True:
            # checks whether the file has downloaded into the folder.
            if os.path.isfile(file_path):
                break

发生了什么：

第一次迭代通过 - 文件（链接到 url）被下载到 H:\\downloads 文件夹中，filename 被打印出来。

在第二次迭代的情况下，文件被下载到文件夹中，但文件名没有被打印出来，涉及的第二个 while 循环进入无限循环。

第二次运行后没有迭代，因为在第二次迭代中无法检索到filename，循环进入无限期模式。

上面代码中的第二个while循环：

while True:  
            # checks whether 'filename' element is available on the page, the page shows multiple elements depending on interaction.  
            if driver.find_element_by_xpath("/html/body/div[6]/div[3]/div/div[1]/div[4]/div[1]/div/span[1]"):  
                filename = driver.find_element_by_xpath("/html/body/div[6]/div[3]/div/div[1]/div[4]/div[1]/div/span[1]").text  
                break  
            elif driver.find_element_by_xpath("/html/body/div[6]/div[3]/div/div[1]/div[5]/div/div/div[1]/div[1]/div[2]/div[3]/div[1]/span[1]"):  
                filename = driver.find_element_by_xpath("/html/body/div[6]/div[3]/div/div[1]/div[5]/div/div/div[1]/div[1]/div[2]/div[3]/div[1]/span[1]").text  
                break  
            else:  
                sleep(5)

文件名 xpath 选项的附加图像（文件名选择两个不同 xpath 的原因）

while 循环第一个选项

while 循环第二个选项

【问题讨论】：

为什么不考虑这种方式：driver.until(ExpectedConditions.or(ExpectedConditions.presenceOfElementLocated(By.xpath("/html/body/div[6]/div[3]/div/div[1]/div[4]/div[1]/div/span[1]")),ExpectedConditions.presenceOfElementLocated(By.xpath("/html/body/div[6]/div[3]/div/div[1]/div[5]/div/div/div[1]/div[1]/div[2]/div[3]/div[1]/span[1]"))));？试试看，如果可行，我会把它作为答案。
@Saggio Vecchino ，首先我很抱歉，我在 python 中使用 selenium，您提供的代码可能适用于 Java。我曾尝试在代码中使用“等待” - 问题通常是它少于或多于所需时间。如果它较少，它会给出 Nosuchelement 例外。我尝试使用“while”循环来确保元素在页面上的那一刻，它被定位并且循环中断。它适用于第一次运行，但在第二次运行时中断。
好吧，对不起，我只是给了你这个想法，并把我得到的样本放在某个地方，让你做类似的事情。再次抱歉，我尝试使用python给出完整的答案，希望对您有所帮助。

标签： python python-3.x loops selenium nested-loops

【解决方案1】：

您正在搜索的是显式等待，我建议您访问 Selenium-python 文档中的 page。我从页面引用：

显式等待是您定义的等待特定条件的代码在代码中进一步进行之前发生。的极端情况这是 time.sleep()，它将条件设置为确切的时间段等待。提供了一些方便的方法来帮助您编写只等待所需时间的代码。 WebDriverWait in 与 ExpectedCondition 结合是一种方法完成。

如果您想了解更多有关 ExpectedCondition 的信息，可以访问此文档的 link

我建议使用 lambda 函数为您的情况编写此代码，因为您至少在等待一个元素。

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

driver = webdriver.Firefox()
driver.get("http://somedomain/url_that_delays_loading")
try:
    xpath1="/html/body/div[6]/div[3]/div/div[1]/div[4]/div[1]/div/span[1]"
    xpath2="/html/body/div[6]/div[3]/div/div[1]/div[5]/div/div/div[1]/div[1]/div[2]/div[3]/div[1]/span[1]"
    timeLimit = 15 #seconds, you really need to set a time out.
    element = WebDriverWait(driver, timeLimit).until( lambda driver: driver.find_elements(By.xpath, xpath1) or driver.find_elements(By.xpath, xpath2) )
finally:
    pass

这会在抛出 TimeoutException 之前等待最多 15 秒，除非它找到您正在通过 xpath 等待的元素之一。默认情况下，WebDriverWait 每 500 毫秒调用一次 ExpectedCondition，直到它成功返回，因此您无需像尝试那样处理逻辑和循环。

为了处理 TimeoutException，您可以例如刷新页面。

【讨论】：