【发布时间】:2021-01-16 09:26:09
【问题描述】:
我有一个我正在迭代的链接列表,如下所示
https://www.loc.gov/item/2015669100/
https://www.loc.gov/item/2015669101/
https://www.loc.gov/item/2015669102/
https://www.loc.gov/item/2015669103/
https://www.loc.gov/item/2015669104/
https://www.loc.gov/item/2015669105/
https://www.loc.gov/item/2015669106/
https://www.loc.gov/item/2015669107/
https://www.loc.gov/item/2015669108/
https://www.loc.gov/item/2015669109/
如果您查看这些链接,您会发现它有一个视频和一个可下载的 XML 文件。我的任务是从视频中下载音频,并从一个页面一起下载 XML 文件。
我的问题是,我如何从这些音频文件中获取音频?
下面是我目前的代码。
from bs4 import BeautifulSoup
from urllib.request import Request, urlopen
base_html = "https://www.loc.gov/collections/civil-rights-history-project/?sp={}"
for i in range(1,8):
html = base_html.format(i)
req = Request(html, headers={'User-Agent': 'Mozilla/5.0'})
soup = BeautifulSoup(urlopen(req).read(), 'html.parser')
pages = soup.findAll('div', attrs={'class' : 'item-description'})
for div in pages:
crawl_p = div.find('a')['href']
#some logic here
【问题讨论】:
-
欢迎来到 Stack Overflow!请花一分钟阅读How do I ask a good question?你的研究工作在哪里?您是否尝试过在谷歌上搜索解决方案?如果是,您尝试过实施什么?哪里出了问题?
标签: python python-3.x audio beautifulsoup