【问题标题】:Not able to get data using Selenium - python无法使用 Selenium 获取数据 - python
【发布时间】:2020-05-14 10:40:34
【问题描述】:

我正在尝试从网站上获取表格历史数据 - https://www1.nseindia.com/products/content/derivatives/currency/historical_contract_cd.htm

以下代码

url = 'https://www1.nseindia.com/products/content/derivatives/currency/historical_contract_cd.htm'
driver.get(url)
driver.implicitly_wait(6)
inst = 'FUTCUR'
symbol = 'USDINR'
contYear = '2020'
expiry = '270520'
contract = symbol + ' ' + expiry
startDate = '12-04-2020'
endDate = '11-05-2020'

instSelect = Select(driver.find_element_by_id('instrument')).select_by_value(inst)
symbolSelect = Select(driver.find_element_by_id('symbol')).select_by_value(symbol)
yearSelect = Select(driver.find_element_by_id('contractYear')).select_by_value(contYear)
contractSelect = Select(driver.find_element_by_id('contract')).select_by_value(contract)

# optionTypeSelect = Select(driver.find_element_by_id('contract')).select_by_value(opType)
# strikeSelect = Select(driver.find_element_by_id('contract')).select_by_value(strike)
driver.find_element_by_xpath("//input[@class='textboxdata hasDatepicker' and @id='fromDt']").send_keys(startDate)
driver.find_element_by_xpath("//input[@class='textboxdata hasDatepicker' and @id='toDt']").send_keys(endDate + "\n")

在最后一行我尝试通过 newline 来点击 Get Data 按钮,但我没有得到任何表格。

谁能给点建议?

否则,您能否提出一个更好的方法来获取这些历史数据,我是 Selenium 的新手,也许有更好的方法来获取这些数据。

问候

【问题讨论】:

    标签: python-3.x selenium web-scraping


    【解决方案1】:

    似乎应用程序正在识别它是机器人。但是,如果您转到Network Tab,您会在手动单击Get Data 后找到以下url。您可以使用以下链接进行导航。

    https://www1.nseindia.com/live_market/dynaContent/live_watch/fx_tracker/FxTradeHistoryNew.jsp?contract=USDINR%20270520&symbol=USDINR&fromDt=12-04-2020&toDt=11-05-2020&instrument=FUTCUR&strikePrice=select&optionType=select&time=1589457333448

    但是,您可以将parameterized url 用于其他任何search query

    代码

    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.support import expected_conditions as EC
    from selenium.webdriver.common.by import By
    from selenium import webdriver
    
    inst = 'FUTCUR'
    symbol = 'USDINR'
    expiry = '270520'
    startDate = '12-04-2020'
    endDate = '11-05-2020'
    
    url="https://www1.nseindia.com/live_market/dynaContent/live_watch/fx_tracker/FxTradeHistoryNew.jsp?contract={0}%20{1}&symbol={0}&fromDt={2}&toDt={3}&instrument={4}&strikePrice=select&optionType=select&time=1589457333448".format(symbol,expiry,startDate,endDate,inst)
    
    driver=webdriver.Chrome()
    driver.get(url)
    rows=WebDriverWait(driver,10).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR,"table>tbody tr")))
    for row in rows[2:]:
        print([td.text for td in row.find_elements_by_xpath(".//td")])
    

    输出:列表格式的表格单元格数据。

    ['11-May-2020', 'FUTCUR', 'USDINR 270520', '-', '-', '75.7150', '75.9575', '75.6100', '75.9350', '75.9375', '22,73,052', '1442585', '10,93,914.24']
    ['08-May-2020', 'FUTCUR', 'USDINR 270520', '-', '-', '75.7500', '75.8525', '75.3000', '75.7350', '75.7350', '23,32,272', '2082334', '15,75,416.91']
    ['06-May-2020', 'FUTCUR', 'USDINR 270520', '-', '-', '75.9000', '76.1000', '75.7825', '76.0325', '76.0325', '22,55,414', '1610773', '12,23,210.60']
    ['05-May-2020', 'FUTCUR', 'USDINR 270520', '-', '-', '75.7500', '76.0700', '75.6525', '75.9425', '75.9425', '23,31,177', '1640933', '12,44,886.32']
    ['04-May-2020', 'FUTCUR', 'USDINR 270520', '-', '-', '75.7525', '76.0200', '75.7525', '75.8475', '75.8475', '22,96,509', '1676336', '12,72,472.35']
    ['30-Apr-2020', 'FUTCUR', 'USDINR 270520', '-', '-', '75.5500', '75.5500', '75.0825', '75.2750', '75.2750', '23,18,407', '2543021', '19,14,672.14']
    ['29-Apr-2020', 'FUTCUR', 'USDINR 270520', '-', '-', '76.1575', '76.3000', '75.8075', '75.9250', '75.9225', '25,65,081', '1882266', '14,30,376.21']
    ['28-Apr-2020', 'FUTCUR', 'USDINR 270520', '-', '-', '76.4975', '76.7400', '76.0650', '76.1200', '76.1200', '26,39,745', '2084993', '15,94,244.06']
    ['27-Apr-2020', 'FUTCUR', 'USDINR 270520', '-', '-', '76.6000', '76.7125', '76.4050', '76.5375', '76.5375', '25,90,069', '1356043', '10,37,735.82']
    ['24-Apr-2020', 'FUTCUR', 'USDINR 270520', '-', '-', '76.6000', '76.8775', '76.4500', '76.6800', '76.6800', '21,89,351', '800447', '6,14,078.45']
    ['23-Apr-2020', 'FUTCUR', 'USDINR 270520', '-', '-', '76.9500', '76.9500', '76.3125', '76.5125', '76.5125', '19,56,117', '698749', '5,34,773.33']
    ['22-Apr-2020', 'FUTCUR', 'USDINR 270520', '-', '-', '77.4725', '77.4725', '76.9050', '76.9550', '76.9550', '19,09,133', '448019', '3,45,608.92']
    ['21-Apr-2020', 'FUTCUR', 'USDINR 270520', '-', '-', '77.0975', '77.4950', '77.0100', '77.4625', '77.4600', '18,23,316', '517761', '4,00,170.48']
    ['20-Apr-2020', 'FUTCUR', 'USDINR 270520', '-', '-', '76.7425', '77.1300', '76.7400', '77.0325', '77.0325', '16,47,582', '410349', '3,15,927.53']
    ['17-Apr-2020', 'FUTCUR', 'USDINR 270520', '-', '-', '77.1375', '77.1950', '76.7400', '76.8950', '76.8950', '15,40,884', '395013', '3,03,905.97']
    ['16-Apr-2020', 'FUTCUR', 'USDINR 270520', '-', '-', '77.2000', '77.5000', '76.9150', '77.4125', '77.4125', '14,75,982', '402968', '3,11,477.92']
    ['15-Apr-2020', 'FUTCUR', 'USDINR 270520', '-', '-', '76.6525', '77.1225', '76.4975', '77.0950', '77.0950', '13,22,890', '425692', '3,26,801.09']
    ['13-Apr-2020', 'FUTCUR', 'USDINR 270520', '-', '-', '77.0925', '77.2000', '76.7000', '76.8025', '76.8050', '11,73,618', '323331', '2,48,706.21']
    

    要导入 pandas,您可以先获取 html,然后使用 pd.read_html() 加载到数据框。

    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.support import expected_conditions as EC
    from selenium.webdriver.common.by import By
    from selenium import webdriver
    import pandas as pd
    inst = 'FUTCUR'
    symbol = 'USDINR'
    expiry = '270520'
    startDate = '12-04-2020'
    endDate = '11-05-2020'
    
    url="https://www1.nseindia.com/live_market/dynaContent/live_watch/fx_tracker/FxTradeHistoryNew.jsp?contract={0}%20{1}&symbol={0}&fromDt={2}&toDt={3}&instrument={4}&strikePrice=select&optionType=select&time=1589457333448".format(symbol,expiry,startDate,endDate,inst)
    
    driver=webdriver.Chrome()
    driver.get(url)
    WebDriverWait(driver,10).until(EC.visibility_of_element_located((By.CSS_SELECTOR,"#csvContentDiv+table")))
    htmldata=driver.page_source
    df=pd.read_html((str(htmldata)))[0]
    print(df)
    #Import into csv
    df.to_csv("testdata.csv",index=False)
    

    【讨论】:

    • 非常感谢,但我需要数据框的形式,包含所有列名等。我正在努力,如果有建议,请帮忙
    猜你喜欢
    • 2021-05-27
    • 2021-01-07
    • 2019-08-01
    • 2021-03-16
    • 2019-01-09
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多