尝试用beautifulsoup 抓取soundcloud答案

【问题标题】：Trying to webscrape soundcloud with beautifulsoup尝试用beautifulsoup 抓取soundcloud
【发布时间】：2021-10-11 07:25:20
【问题描述】：

我正在尝试 scrape soundcloud 和其他音乐平台以获取数据，但我似乎卡在 soundcloud 上，因为我得到 None、AttributeError 或 []，但是当我尝试 scraping 一个常规网站（非音乐）。我得到数据。我做错了什么请帮忙。

from bs4 import BeautifulSoup
import requests

html_text = requests.get('https://soundcloud.com/jujubucks').text
soup = BeautifulSoup(html_text,'lxml')
song = soup.find('li', class_='soundList__item')
print(song)

这个代码返回这个。

None or AttributeError.

【问题讨论】：

歌曲可能是通过javascript动态获取的。看看selenium为了处理javascript。
向您的请求添加用户代理

标签： python html web-scraping beautifulsoup data-mining

【解决方案1】：

查看原始输出（代码中的变量汤）。

此代码提取原始歌曲标题：

from bs4 import BeautifulSoup
import requests

html_text = requests.get('https://soundcloud.com/jujubucks').text
soup = BeautifulSoup(html_text, 'lxml')
song = soup.find_all('h2', itemprop='name')
print(song)

上面代码的输出列表中的一个项目示例：

<h2 itemprop="name"><a href="/jujubucks/squad-too-deep-ft-cool-prince" itemprop="url">Squad Too Deep Ft. Cool Prince (Outro)</a>

但如果没有 selenium 或 scrapy，您无法从该网站抓取所有数据，它们使用动态加载的内容。

【讨论】：

谢谢，它起作用了，但是因为我也想要其他数据，比如播放和发布日期。我想我必须使用硒。