【发布时间】:2020-01-07 18:11:12
【问题描述】:
我正在尝试在 HTML 中的(大)script 标记内获取数据。通过使用 Beautifulsoup,我可以接近必要的script,但我无法获得我想要的数据。
我在这个标签中寻找的内容位于一个名为“Beleidsdekkingsgraad”的列表中,更具体地说
["Beleidsdekkingsgraad","107,6","107,6","109,1","109,8","110,1","111,5","112,5","113,3","113,3","114,3","115,7","116,3","116,9","117,5","117,8","118,1","118,3","118,4","118,6","118,8","118,9","118,9","118,9","118,5","118,1","117,8","117,6","117,5","117,1","116,7","116,2"] 更具体;列表中的最后一项 (116,2)
到目前为止我做了什么
base='https://e.infogr.am/pob_dekkingsgraadgrafiek?src=embed#async_embed'
url=requests.get(base)
soup=BeautifulSoup(url.text, 'html.parser')
all_scripts = soup.find_all('script')
all_scripts[3].get_text()[1907:2179]
然而,这并不令人满意,因为每次添加新数字时都必须更改索引。
我正在寻找一种从script 标记中提取列表的简单方法,其次是捕获提取列表的最后一个数字(即 116,2)
【问题讨论】:
标签: python-3.x web-scraping beautifulsoup