【发布时间】:2018-06-10 08:58:30
【问题描述】:
我想在 Python 中使用 BeautifulSoup 从“脚本”标签中的代码中提取“SNG_TITLE”和“ART_NAME”值。 (整个脚本太长,无法粘贴)
<script>window.__DZR_APP_STATE__ = {"TAB":{"loved":{"data":[{"SNG_ID":"126884459","PRODUCT_TRACK_ID":"360276641","UPLOAD_ID":0,"SNG_TITLE":"Heathens","ART_ID":"647650","PROVIDER_ID":"3","ART_NAME":"Twenty One Pilots","ARTISTS":[{"ART_ID":"647650","ROLE_ID":"0","ARTISTS_SONGS_ORDER":"1","ART_NAME":"Twenty One Pilots","ART_PICTURE":"259dcf52853363d79753ec301377645d","SMARTRADIO":"1","RANK":"487762","LOCALES":[],"__TYPE__":"artist"}],"ALB_ID":"13371165","ALB_TITLE":"Heathens","TYPE":0,"MD5_ORIGIN":"5cea723b83af1ff0a62d65d334b978d4","VIDEO":false,"DURATION":"195","ALB_PICTURE":"3dfc8c9e406cf1bba8ce0695a44a9b7e","ART_PICTURE":"259dcf52853363d79753ec301377645d","RANK_SNG":"967143","SMARTRADIO":"1","FILESIZE_AAC_64":0,"FILESIZE_MP3_64":"0","FILESIZE_MP3_128":"3135946","FILESIZE_MP3_256":0,"FILESIZE_MP3_320":"7839868","FILESIZE_FLAC":"21777150","FILESIZE":"3135946","GAIN":"-12","MEDIA_VERSION":"4","DISK_NUMBER":"1","TRACK_NUMBER":"1","VERSION":"","EXPLICIT_LYRICS":"0","RIGHTS":{"STREAM_ADS_AVAILABLE":true,"STREAM_ADS":"2000-01-01","STREAM_SUB_AVAILABLE":true,"STREAM_SUB":"2000-01-01"},"ISRC":"USAT21601930","DATE_ADD":1497886149,"HIERARCHICAL_TITLE":"","SNG_CONTRIBUTORS":{"mainartist":["Twenty One Pilots"],"engineer":["Adam Hawkins"],"mixer":["Adam Hawkins"],"masterer":["Chris Gehringer"],"drums":["Josh Dun"],"producer":["Mike Elizondo","Tyler Joseph"],"programmer":["Mike Elizondo","Tyler Joseph"],"vocals":["Tyler Joseph"],"writer":["Tyler Joseph"]},"LYRICS_ID":30553991,"__TYPE__":"song"},{"SNG_ID":"99976952","PRODUCT_TRACK_ID":"171067651","UPLOAD_ID":0,"SNG_TITLE":"Stressed Out","ART_ID":"647650","PROVIDER_ID":"3","ART_NAME":"Twenty One Pilots","ARTISTS":[{"ART_ID":"647650","ROLE_ID":"0","ARTISTS_SONGS_ORDER":"1","ART_NAME":"Twenty One Pilots", ...</script>
代码的想法是打印出用户名、可以在给定页面上找到的所有歌曲和艺术家姓名。
import requests
from bs4 import BeautifulSoup
base_url = 'https://www.deezer.com/en/profile/1589856782/loved'
r = requests.get(base_url)
soup = BeautifulSoup(r.text, 'html.parser')
user_name = soup.find(class_='user-name')
print(user_name.text)
这会打印用户名。
for script in soup.find_all('script'):
print(script.contents)
如果我理解正确的话,我需要的脚本是一个字典,所以我只需要找到它并获取它的内容。问题是我不知道如何专门找到正是这个“脚本”。它没有任何属性或任何使其独特的东西。所以我尝试了一个循环来查找页面上的所有脚本并打印出它们的内容,但不知道如何进一步进行。
如何在页面上只找到这个特定的“脚本”?我可以通过其他方式访问这些值吗?
【问题讨论】:
-
您是否要提取其中包含“window.__DZR_APP_STATE__”的脚本元素?
-
计算代码中的脚本 - 它们不会改变位置 - 并使用索引来获得正确的 - 即。第三个脚本
soup.find_all('script')[2] -
顺便说一句:脚本是普通字符串,因此您可以使用标准字符串函数来检查它 - 即
if "loved" in script.contents:
标签: python beautifulsoup deezer