【发布时间】:2021-02-24 05:34:13
【问题描述】:
我正在关注 Seppe vanden Broucke 和 Bart Baesens 所著的“Practical Web Scraping for Data Science Best Practices and examples with Python”一书。
下一个代码应该从 Wikipedia 获取数据,即权力的游戏剧集列表:
import requests
from bs4 import BeautifulSoup
url = 'https://en.wikipedia.org/w/index.php' + \
'?title=List_of_Game_of_Thrones_episodes&oldid=802553687'
r = requests.get(url)
html_contents = r.text
html_soup = BeautifulSoup(html_contents, 'html.parser')
# We'll use a list to store our episode list
episodes = []
ep_tables = html_soup.find_all('table', class_='wikiepisodetable')
for table in ep_tables:
headers = []
rows = table.find_all('tr')
for header in table.find('tr').find_all('th'):
headers.append(header.text)
for row in table.find_all('tr')[1:]:
values = []
for col in row.find_all(['th','td']):
values.append(col.text)
if values:
episode_dict = {headers[i]: values[i] for i in
range(len(values))}
episodes.append(episode_dict)
for episode in episodes:
print(episode)
但在运行代码时会显示下一个错误:
{'No.overall': '1'}
IndexError Traceback(最近一次调用最后一次)
<ipython-input-8-d2e64c7e0540> in <module>
20 if values:
21 episode_dict = {headers[i]: values[i] for i in
---> 22 range(len(values))}
23 episodes.append(episode_dict)
24 for episode in episodes:
<ipython-input-8-d2e64c7e0540> in <dictcomp>(.0)
19 values.append(col.text)
20 if values:
---> 21 episode_dict = {headers[i]: values[i] for i in
22 range(len(values))}
23 episodes.append(episode_dict)
IndexError: list index out of range
谁能告诉为什么会这样?
【问题讨论】:
标签: python web-scraping beautifulsoup web-crawler