【发布时间】:2021-07-10 04:07:24
【问题描述】:
我是网络抓取的新手;我正在尝试从site 中获取信息水公用设施。我目前能够通过下拉菜单成功浏览每个区域,并访问第一页。在转到下一个区域之前,我目前无法成功导航到所有页面的下一页。页面导航栏是一个没有“下一步”按钮的列表,我目前尝试使用范围遍历列表。当我得到 len 时,我没有得到正确的列表范围。就目前而言,我只能转到每个区域的第一页。即使在尝试寻找类似问题的答案之后,我仍在努力弄清楚我做错了什么或要考虑什么。对此的任何帮助将不胜感激。
谢谢!
这是我当前的代码(我没有抓取,专注于导航页面):
import time
import pandas as pd
from selenium import webdriver
from selenium.webdriver.support.ui import Select, WebDriverWait
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException, WebDriverException
url = 'https://database.ib-net.org/search_utilities?type=2'
browser = webdriver.Firefox()
browser.get(url)
time.sleep(3)
print("Retriving the site...")
# All regions available
regions = ['Africa', 'East Asia and Pacific', 'Europe and Central Asia', 'Latin America (including USA and Canada)', 'Middle East and Northern Africa', 'South Asia']
for region in regions:
# Select all options from drop down menu
selectOption = Select(browser.find_element_by_id('MainContent_ddRegion'))
print("Now constructing output for: " + region)
# Select table and wait for data to populate
selectOption.select_by_visible_text(region)
time.sleep(4)
list_of_table_pages = browser.find_element_by_xpath('//*[@id="MainContent_gvUtilities"]/tbody/tr[52]/td/ul')
no_pages = len(list_of_table_pages.find_elements_by_xpath("//li"))
print(("No of table pages to be scraped are: %d") %no_pages)
print("Outputing data into "+ region +".csv...")
all_table_data = []
# starts the range count from 1 instead of 0
for page in range(1, no_pages):
try:
#Navigate to the next page once done
table_page = str(page)
WebDriverWait(browser, 20).until(EC.visibility_of_element_located((By.XPATH, '//*[@id="MainContent_gvUtilities"]/tbody/tr[52]/td/ul/li['+ table_page + ']/a'))).click()
print("Navigating to next table page...")
except (TimeoutException, WebDriverException):
print("Last page reached, moving to the next region...")
break
print("No more pages to scrape under %s. Moving to the next region..." %region)
browser.close()
browser.quit()
【问题讨论】:
标签: python selenium web-scraping