【发布时间】:2021-04-24 04:20:43
【问题描述】:
我正在尝试使用 beautifulsoup 从 html 中的 json 格式中提取数据,如下所示。
<script type="application/ld+json">{
"@context": "http://schema.org",
"@type": "Movie",
"url": "/title/tt1825683/",
"name": "Black Panther",
"image": "https://m.media-amazon.com/images/M/MV5BMTg1MTY2MjYzNV5BMl5BanBnXkFtZTgwMTc4NTMwNDI@._V1_.jpg",
"genre": [
"Action",
"Adventure",
"Sci-Fi"
],
"contentRating": "PG-13",
"actor": [
{
"@type": "Person",
"url": "/name/nm1569276/",
"name": "Chadwick Boseman"
},
{
"@type": "Person",
"url": "/name/nm0430107/",
"name": "Michael B. Jordan"
},
{
"@type": "Person",
"url": "/name/nm2143282/",
"name": "Lupita Nyong\u0027o"
},
{
"@type": "Person",
"url": "/name/nm1775091/",
"name": "Danai Gurira"
}
],
"director": {
"@type": "Person",
"url": "/name/nm3363032/",
"name": "Ryan Coogler"
},
}</script>
我到了提取整个 json 的这一部分,但我如何能够获取数据的特定属性?
soup_url = BeautifulSoup(url, 'html.parser')
url_info = soup_url.find_all("script",type="application/ld+json")
【问题讨论】:
-
循环遍历
find_all返回的元素,获取每个元素的文本,然后调用json.loads()。 -
这将返回一个字典,然后您可以像访问任何其他 Python 字典一样访问它。
-
您遇到了哪些问题?
标签: python json web-scraping beautifulsoup