使用 Python 从下拉菜单中抓取答案

【问题标题】：Scraping from dropdown menus with Python使用 Python 从下拉菜单中抓取
【发布时间】：2017-01-31 10:35:35
【问题描述】：

我是 Python 的新手，并尝试在其中检索数据 this Site 使用 Python 3.6.0 版

有 2 个下拉菜单，第二个数据取决于第一个的选择。

第一： 'Organizasyon Adi' 第二个： 'UEVCB Adi'

来源的所有选项都是这样的：

<option value="0" selected="selected">TÜMÜ</option> #this is default value when we open the page
<option value="10374">1461 TRABZON ELEKTRİK ÜRETİM A.Ş</option>
<option value="9426">2M ELEKTRİK ÜRETİM SANAYİ VE TİCARET ANONİM ŞİRKETİ</option>

这些是第一个 Dropdown 的选项，有将近 800 个选项。

除非单击第二个下拉框，否则如果不检查页面，我们将无法看到第二个下拉选项。（点击时两个下拉菜单都会打开一个搜索框。）

第二个下拉菜单打开所选组织的单位列表。

选择两个下拉目的选项时，它会生成表数据，我们正在尝试获取所有单位的数据。

我不能用一个程序来报废所有单位的数据，所以我决定单独报废。

使用此代码：

from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support.select import Select
from selenium.webdriver.common.action_chains import ActionChains
import time
from bs4 import BeautifulSoup
import pandas as pd 

url = 'https://seffaflik.epias.com.tr/transparency/uretim/planlama/kgup.xhtml' #
driver = webdriver.Chrome()
driver.get(url)
time.sleep(3)
organisation = driver.find_element_by_xpath(".//*[@id='j_idt102:distributionId_label']")
organisation.click()
dropdown1 =  driver.find_element_by_xpath(".//*[@id='j_idt102:distributionId_filter']")
dropdown1.send_keys('1461')
dropdown1.send_keys(u'\ue007')
unit = driver.find_element_by_id('j_idt102:uevcb_label')
dropdown2 = driver.find_element_by_xpath(".//*[@id='j_idt102:uevcb_filter']")
dropdown2.send_keys('SAMA')
dropdown2.send_keys(u'\ue007')
apply= driver.find_element_by_xpath("//*[@id='j_idt102:goster']")
apply.click()
time.sleep(5)

soup = BeautifulSoup(driver.page_source)

table = soup.find_all('table')[0]
rows = table.find_all('tr')[1:]

data = {
    '01.Date' : [],
    '02.Hour' : [],
    '03.NaturalGas' : [],
    '04.Wind' : [],
    '05.Lignite' : [],
    '06.Hard_Coal' : [],
    '07.ImportedCoal' : [],
    '08.Geothermal' : [],
    '09.Hydro_Dam' : [],
    '10.Naphta' : [],
    '11.Biomass' : [],
    '12.River' : [],
    '13.Other' : []
}

for row in rows:
    cols = row.find_all('td')
    data['01.Date'].append( cols[0].get_text() )
    data['02.Hour'].append( cols[1].get_text() )
    data['03.NaturalGas'].append( cols[3].get_text() )
    data['04.Wind'].append( cols[4].get_text() )
    data['05.Lignite'].append( cols[5].get_text() )
    data['06.Hard_Coal'].append( cols[6].get_text() )
    data['07.ImportedCoal'].append( cols[7].get_text() )
    data['08.Geothermal'].append( cols[8].get_text() )
    data['09.Hydro_Dam'].append( cols[9].get_text() )
    data['10.Naphta'].append( cols[10].get_text() )
    data['11.Biomass'].append( cols[11].get_text() )
    data['12.River'].append( cols[12].get_text() )
    data['13.Other'].append( cols[13].get_text() )

df = pd.DataFrame( data )
writer = pd.ExcelWriter('//192.168.0.102/Data/kgup.xlsx', engine='xlsxwriter')
df.to_excel(writer, sheet_name='Sheet1')
writer.save()
time.sleep(5)
driver.close()

通过此代码，我们可以使用搜索功能和 Enter 键从第一个下拉列表中进行选择。

到秒时，生成ImportError: sys.meta_path is None, Python is likely shutting down

我应该如何处理？

谢谢。

【问题讨论】：

标签： python python-3.x selenium drop-down-menu web-scraping

【解决方案1】：

您的代码似乎对StaleElementException 以及异常Element is not clickable at point... 敏感。试试下面的网页抓取部分代码，让我知道结果：

from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support.select import Select
from selenium.webdriver.common.action_chains import ActionChains
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import WebDriverWait
from bs4 import BeautifulSoup
import pandas as pd 

url = 'https://seffaflik.epias.com.tr/transparency/uretim/planlama/kgup.xhtml' #
driver = webdriver.Chrome()
driver.get(url)
wait = WebDriverWait(driver, 20)
driver.maximize_window()

wait.until_not(EC.visibility_of_element_located((By.ID,'j_idt15'))) # wait until modal disappeared
wait.until(EC.element_to_be_clickable((By.ID,'j_idt102:distributionId_label'))).click() # organization drop-down
wait.until(EC.element_to_be_clickable((By.ID, 'j_idt102:distributionId_filter'))).send_keys('1461' + u'\ue007') # select required
wait.until_not(EC.visibility_of_element_located((By.ID,'j_idt179_modal'))) # wait until modal disappeared
wait.until(EC.element_to_be_clickable((By.ID,'j_idt102:uevcb_label'))).click() # unit drop-down
wait.until(EC.element_to_be_clickable((By.ID, 'j_idt102:uevcb_filter'))).send_keys('SAMA' + u'\ue007') # select unit
wait.until(EC.element_to_be_clickable((By.ID,'j_idt102:goster'))).click() # click Apply
wait.until_not(EC.visibility_of_element_located((By.ID,'j_idt15'))) # wait until modal disappeared

soup = BeautifulSoup(driver.page_source)
....

【讨论】：