【发布时间】:2018-12-28 21:43:59
【问题描述】:
我正在尝试制作一个网络爬虫来从以下网站获取数据(我稍后想为同一网站上的几家航空公司做这件事): https://www.flightradar24.com/data/airlines/kl-klm/routes
我目前有以下代码:
from bs4 import BeautifulSoup
import requests
airlines = ['kl-klm']
for a in airlines:
url = 'https://www.flightradar24.com/data/airlines/' + a + '/routes'
page = requests.get(url)
soup = BeautifulSoup(page.text, 'html.parser')
print(soup)
这给了我整个页面的源代码,但我想在脚本标签中提取一段特定的文本,即
var arrRoutes=[{"airport1":{"country":"Denmark","iata":"AAL","icao":"EKYT","lat":57.092781,"lon":9.849164,"name":"Aalborg Airport"},"airport2":{"country":"Netherlands","iata":"AMS","icao":"EHAM","lat":52.308609,"lon":4.763889,"name":"Amsterdam Schiphol Airport"}},{"airport1":{"country":"United Kingdom","iata":"ABZ","icao":"EGPD","lat":57.201939,"lon":-2.19777,"name":"Aberdeen International Airport"},"airport2":{"country":"Netherlands","iata":"AMS","icao":"EHAM","lat":52.308609,"lon":4.763889,"name":"Amsterdam Schiphol Airport"}}...
...等等。一直到列表的末尾。
如何提取此信息,以便我可以找到每个机场的入境和出境航班总数?例如,阿姆斯特丹史基浦机场作为机场 1 或 2 出现的总次数?
有没有办法先从 HTML 中提取字符串,然后将其转换为带有字典的 Python 列表?还是直接计算字符串中的每个元素更有意义?
【问题讨论】:
标签: python python-3.x web-scraping beautifulsoup python-requests