【问题标题】:Scraping from dropdown menus with Python使用 Python 从下拉菜单中抓取
【发布时间】:2017-01-31 10:35:35
【问题描述】:

我是 Python 的新手,并尝试在其中检索数据 this Site 使用 Python 3.6.0 版

有 2 个下拉菜单,第二个数据取决于第一个的选择。

第一: 'Organizasyon Adi' 第二个: 'UEVCB Adi'

来源的所有选项都是这样的:

<option value="0" selected="selected">TÜMÜ</option> #this is default value when we open the page
<option value="10374">1461 TRABZON ELEKTRİK ÜRETİM A.Ş</option>
<option value="9426">2M ELEKTRİK ÜRETİM SANAYİ VE TİCARET ANONİM ŞİRKETİ</option>

这些是第一个 Dropdown 的选项,有将近 800 个选项。

除非单击第二个下拉框,否则如果不检查页面,我们将无法看到第二个下拉选项。 (点击时两个下拉菜单都会打开一个搜索框。)

第二个下拉菜单打开所选组织的单位列表。

选择两个下拉目的选项时,它会生成表数据,我们正在尝试获取所有单位的数据。

我不能用一个程序来报废所有单位的数据,所以我决定单独报废。

使用此代码:

from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support.select import Select
from selenium.webdriver.common.action_chains import ActionChains
import time
from bs4 import BeautifulSoup
import pandas as pd 

url = 'https://seffaflik.epias.com.tr/transparency/uretim/planlama/kgup.xhtml' #
driver = webdriver.Chrome()
driver.get(url)
time.sleep(3)
organisation = driver.find_element_by_xpath(".//*[@id='j_idt102:distributionId_label']")
organisation.click()
dropdown1 =  driver.find_element_by_xpath(".//*[@id='j_idt102:distributionId_filter']")
dropdown1.send_keys('1461')
dropdown1.send_keys(u'\ue007')
unit = driver.find_element_by_id('j_idt102:uevcb_label')
dropdown2 = driver.find_element_by_xpath(".//*[@id='j_idt102:uevcb_filter']")
dropdown2.send_keys('SAMA')
dropdown2.send_keys(u'\ue007')
apply= driver.find_element_by_xpath("//*[@id='j_idt102:goster']")
apply.click()
time.sleep(5)

soup = BeautifulSoup(driver.page_source)

table = soup.find_all('table')[0]
rows = table.find_all('tr')[1:]

data = {
    '01.Date' : [],
    '02.Hour' : [],
    '03.NaturalGas' : [],
    '04.Wind' : [],
    '05.Lignite' : [],
    '06.Hard_Coal' : [],
    '07.ImportedCoal' : [],
    '08.Geothermal' : [],
    '09.Hydro_Dam' : [],
    '10.Naphta' : [],
    '11.Biomass' : [],
    '12.River' : [],
    '13.Other' : []
}

for row in rows:
    cols = row.find_all('td')
    data['01.Date'].append( cols[0].get_text() )
    data['02.Hour'].append( cols[1].get_text() )
    data['03.NaturalGas'].append( cols[3].get_text() )
    data['04.Wind'].append( cols[4].get_text() )
    data['05.Lignite'].append( cols[5].get_text() )
    data['06.Hard_Coal'].append( cols[6].get_text() )
    data['07.ImportedCoal'].append( cols[7].get_text() )
    data['08.Geothermal'].append( cols[8].get_text() )
    data['09.Hydro_Dam'].append( cols[9].get_text() )
    data['10.Naphta'].append( cols[10].get_text() )
    data['11.Biomass'].append( cols[11].get_text() )
    data['12.River'].append( cols[12].get_text() )
    data['13.Other'].append( cols[13].get_text() )

df = pd.DataFrame( data )
writer = pd.ExcelWriter('//192.168.0.102/Data/kgup.xlsx', engine='xlsxwriter')
df.to_excel(writer, sheet_name='Sheet1')
writer.save()
time.sleep(5)
driver.close()

通过此代码,我们可以使用搜索功能和 Enter 键从第一个下拉列表中进行选择。

到秒时,生成ImportError: sys.meta_path is None, Python is likely shutting down

我应该如何处理?

谢谢。

【问题讨论】:

    标签: python python-3.x selenium drop-down-menu web-scraping


    【解决方案1】:

    您的代码似乎对StaleElementException 以及异常Element is not clickable at point... 敏感。试试下面的网页抓取部分代码,让我知道结果:

    from selenium import webdriver
    from selenium.webdriver.common.keys import Keys
    from selenium.webdriver.support.select import Select
    from selenium.webdriver.common.action_chains import ActionChains
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support import expected_conditions as EC
    from selenium.webdriver.support.ui import WebDriverWait
    from bs4 import BeautifulSoup
    import pandas as pd 
    
    url = 'https://seffaflik.epias.com.tr/transparency/uretim/planlama/kgup.xhtml' #
    driver = webdriver.Chrome()
    driver.get(url)
    wait = WebDriverWait(driver, 20)
    driver.maximize_window()
    
    wait.until_not(EC.visibility_of_element_located((By.ID,'j_idt15'))) # wait until modal disappeared
    wait.until(EC.element_to_be_clickable((By.ID,'j_idt102:distributionId_label'))).click() # organization drop-down
    wait.until(EC.element_to_be_clickable((By.ID, 'j_idt102:distributionId_filter'))).send_keys('1461' + u'\ue007') # select required
    wait.until_not(EC.visibility_of_element_located((By.ID,'j_idt179_modal'))) # wait until modal disappeared
    wait.until(EC.element_to_be_clickable((By.ID,'j_idt102:uevcb_label'))).click() # unit drop-down
    wait.until(EC.element_to_be_clickable((By.ID, 'j_idt102:uevcb_filter'))).send_keys('SAMA' + u'\ue007') # select unit
    wait.until(EC.element_to_be_clickable((By.ID,'j_idt102:goster'))).click() # click Apply
    wait.until_not(EC.visibility_of_element_located((By.ID,'j_idt15'))) # wait until modal disappeared
    
    soup = BeautifulSoup(driver.page_source)
    ....
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 2020-07-04
      • 2021-03-26
      • 1970-01-01
      • 2019-10-31
      • 2021-09-13
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多