【发布时间】:2020-12-22 21:38:17
【问题描述】:
我正在尝试从单个 URL 中抓取单个击球数据,这是一个示例 (https://baseballsavant.mlb.com/savant-player/willson-contreras-575929?stats=gamelogs-r-hitting-statcast&season=2020)
好像隐藏了数据或者我无法使用它来获取它
driver = webdriver.Chrome('/Users/gru/Documents/chromedriver')
driver.get('https://baseballsavant.mlb.com/savant-player/willson-contreras-575929?stats=gamelogs-r-hitting-statcast&season=2020')
html_page = driver.page_source
time.sleep(15)
soup = BeautifulSoup(html_page, 'lxml')
for j in soup.find_all('tr'):
drounders=[]
for h in j.find_all('td'):
drounders.append(h.get_text())
print(drounders)
这是前几行预期的行
Game Date Bat Team Fld Team Pitcher Result EV (MPH) LA (°) Dist (ft) Direction Pitch (MPH) Pitch Type
1 2020-08-12 Carrasco, Carlos strikeout
2 2020-08-12 Carrasco, Carlos strikeout
3 2020-08-12 Carrasco, Carlos force_out Opposite
4 2020-08-11 Allen, Logan force_out 107.8 -25 5 Pull 94.0 4-Seam Fastball
5 2020-08-11 Allen, Logan strikeout 77.3 Curveball
6 2020-08-11 Hill, Cam sac_fly 100.5 42 345 Straightaway 91.6 4-Seam Fastball
【问题讨论】:
-
您应该查看scrapy。它自动化了很多事情,使网络抓取变得更加容易。
标签: python selenium web-scraping