【发布时间】:2020-01-23 02:54:02
【问题描述】:
我正在尝试从 .jsp 页面中抓取一个表格(详情如下)。 表格仅在输入数据后加载(火车号和旅程站)
对于您的试验,列车号可以是56913,旅程站可以是SBC(输入数据后会自动更改为“KSR Bengaluru”。 p>
使用下面的脚本,我可以生成表格,但是我无法提取它(在空列表中打印结果)。我需要得到完整的桌子。谁能帮忙让知道如何提取表格?
我对网络抓取非常陌生。因此,如果犯了一些基本错误,请轻推我到正确的方向。
import time
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.firefox.options import Options
from selenium.webdriver import Firefox
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.action_chains import ActionChains
from bs4 import BeautifulSoup
import soupsieve as sv
import requests
# Activate the following line if you do not want to see the Firefox window.
# Better deactivate it for debugging.
# os.environ['MOZ_HEADLESS'] = '1'
url = 'https://enquiry.indianrail.gov.in/ntes/trainOnMapBh.jsp'
opts = Options()
driver = Firefox(firefox_binary=r"C:\Program Files (x86)\Mozilla Firefox\firefox.exe", options=opts)
driver.get(url)
WebDriverWait(driver, 20)
train_field = driver.find_element_by_id("trnSrchTxt")
train_field.send_keys("56913")
time.sleep(2)
actions = ActionChains(driver)
actions.send_keys('SBC',Keys.ENTER)
actions.perform()
WebDriverWait(driver, 1)
result_table = driver.find_elements_by_id("mapTrnSch")
print(result_table)
更新 除了@MadRay 的回答,下面的代码也获取了数据(不确定它有多健壮)。
import os
import time
from bs4 import BeautifulSoup
from selenium.webdriver.support.ui import WebDriverWait
from selenium import webdriver
from selenium.webdriver.firefox.options import Options
from selenium.webdriver import Firefox
from selenium.webdriver.common.action_chains import ActionChains
from selenium.webdriver.common.keys import Keys
import re
os.environ['MOZ_HEADLESS'] = '1'
opts = Options()
driver = Firefox(firefox_binary=r"C:\Program Files (x86)\Mozilla Firefox\firefox.exe", options=opts)
driver.get('https://enquiry.indianrail.gov.in/ntes/trainOnMapBh.jsp')
WebDriverWait(driver, 20)
train_field = driver.find_element_by_id("trnSrchTxt")
train_field.send_keys("11302")
time.sleep(2)
actions = ActionChains(driver)
actions.send_keys('SBC',Keys.ENTER)
actions.perform()
time.sleep(2)
res = driver.execute_script("return document.documentElement.outerHTML")
driver.quit()
soup = BeautifulSoup(res, 'lxml')
table_rows =soup.find_all('table')[3].find_all('tr')
rows=[]
for tr in table_rows:
td = tr.find_all('td')
rows.append([i.text for i in td])
delaydata = rows[3:]
import pandas as pd
df = pd.DataFrame(delaydata, columns = ['StopNo','Station',1,'SchArr','SchDep','ETA_ATA','Arr_Delay','ETD_ATD','DepDelay','Distance','PF'])
df
【问题讨论】:
标签: python selenium selenium-webdriver web-scraping