【问题标题】:Using python selenium to click and download files使用python selenium点击下载文件
【发布时间】:2020-10-02 05:47:03
【问题描述】:

我目前正在尝试自动执行从该网站单击不同下载链接的任务: https://www.theice.com/clear-us/risk-management#margin-rates

在此页面中,我首先必须单击“下载 ICE 风险模型数组文件”标题,该标题为我提供了 2 个下拉列表,我想首先单击“最终”链接,该链接下载每个可用月的 csv 文件年。

目前,由于上面隐藏的下拉菜单,两个下拉菜单都发生了变化,我首先尝试使它们可见,这是成功的,并且使用 selenium click 改变了年份,

出现的问题是我无法单击 csv 部分中的“最终”链接,但它只是单击“ 洲际交易所” 页脚中的按钮并导航到新页面。

有没有办法完成这个任务?

是否可以将下载位置更改为 .py 脚本所在的当前目录?

这是目前的python代码,我目前删除了无头部分以查看发生了什么:

from bs4 import BeautifulSoup
import requests
from selenium.webdriver.chrome.options import Options
from shutil import which
from selenium.webdriver.common.keys import Keys
from selenium import webdriver
from selenium.webdriver.support.ui import Select
import time
import os

chrome_options = Options()
chrome_options.add_argument("--headless")
chrome_options.add_argument("--log-level=3")
chrome_path = which("chromedriver")
driver = webdriver.Chrome(executable_path=chrome_path, options=chrome_options)
driver.set_window_size(1366, 768)

driver.get("https://www.theice.com/clear-us/risk-management#margin-rates")

main_button = driver.find_element_by_xpath('//h4[@class="collapsible-section-header"]')
main_button.click()
time.sleep(5)

driver.execute_script("document.getElementById('icus-ice-form-year').style.display = 'block';")
driver.execute_script("document.getElementById('icus-ice-form-month').style.display = 'block';")
time.sleep(1)

dropdown_1 = Select(driver.find_element_by_xpath('//select[@id="icus-ice-form-year"]'))
dropdown_2 = Select(driver.find_element_by_xpath('//select[@id="icus-ice-form-year"]'))
main_table_div = driver.find_element_by_xpath('//div[@id="icus-ice-riskarraytable"]')

main_table = main_table_div.find_element_by_xpath('//table[@class="table table-data"]')

for opt in dropdown_1.options:
    
    
    opt.click()
    for opt2 in dropdown_2.options:
        opt2.click()
        time.sleep(3)
        download_links_1 = main_table.find_elements_by_xpath('//td[@class="table-partitioned"]')
        for dow in download_links_1:
            try:
                temp_dow = dow.find_element_by_xpath('//a')

                temp_dow.click()

                time.sleep(4)

            except:
                pass
            



【问题讨论】:

    标签: python selenium web-scraping


    【解决方案1】:

    这应该将下载切换到当前工作目录并打印所有早期的 csv。将 '//[@id="icus-ice-riskarraytable"]/table/tbody/tr[{0}]/td[2]/a' 更改为 '//[@id= "icus-ice-riskarraytable"]/table/tbody/tr[{0}]/td[3]/a' 用于其他最终 csv。

    options = Options()
    currentDirectory = os.getcwd()
    prefs = {
        "download.default_directory": currentDirectory,
        "download.prompt_for_download": False
    }
    #print(currentDirectory)
    options.add_experimental_option("prefs", prefs)
    driver.get("https://www.theice.com/clear-us/risk-management#margin-rates")
    driver.implicitly_wait(5)
    main_button = driver.find_element_by_xpath('//h4[@class="collapsible-section-header"]')
    main_button.click()
    driver.implicitly_wait(5)
    driver.execute_script("document.getElementById('icus-ice-form-year').style.display = 'block';")
    driver.execute_script("document.getElementById('icus-ice-form-month').style.display = 'block';")
    driver.implicitly_wait(5)
    drop1length=len(driver.find_elements_by_xpath('//select[@id="icus-ice-form-year"]/option'))
    #print(drop1length)
    for i in range(1,drop1length-1):
        drop1=Select(driver.find_element_by_xpath('//select[@id="icus-ice-form-year"]'))
        drop1.select_by_index(i)
        drop2length=len(driver.find_elements_by_xpath('//select[@id="icus-ice-form-month"]/option'))
        #print(drop2length)
        for j in range(1,drop2length-1):
            drop2=Select(driver.find_element_by_xpath('//select[@id="icus-ice-form-month"]'))
            driver.implicitly_wait(5)
            drop2.select_by_index(j)
            download_links_length = len(driver.find_elements_by_xpath('//*[@id="icus-ice-riskarraytable"]/table/tbody/tr/td[2]/a'))
            #print(download_links_length)
            for dow in range(1,download_links_length-1):
                try:
                    element = driver.find_element_by_xpath('//*[@id="icus-ice-riskarraytable"]/table/tbody/tr[{0}]/td[2]/a'.format(str(dow)))
                    driver.implicitly_wait(5)
                    driver.execute_script("arguments[0].click();", element)
                    driver.switch_to.window(driver.window_handles[0])
                  
                except Exception as e:
                    print(e)
            
    

    导入这些

    import os
    from selenium.webdriver.chrome.options import Options
    

    【讨论】:

    • 我运行了代码,它能够显示隐藏的下拉菜单,然后日期更改为 2019 年 12 月,然后出现此错误,:Traceback (most recent call last): File "ice_clear_us2.py", line 43, in <module> drop2.select_by_index(j) File "C:\Users\faisa\AppData\Local\Programs\Python\Python36\lib\site-packages\selenium\webdriver\support\select.py", line 103, in select_by_index raise NoSuchElementException("Could not locate element with index %d" % index) selenium.common.exceptions.NoSuchElementException: Message: Could not locate element with index 1
    • 哦,我在添加之前有一些等待。
    猜你喜欢
    • 2020-07-30
    • 2022-01-19
    • 2020-08-03
    • 2013-08-28
    • 1970-01-01
    • 2020-05-04
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多