【发布时间】:2019-06-27 00:37:38
【问题描述】:
我有以下代码可以成功提取播客剧集的链接、标题等。我将如何着手拉动它涉及的第一个(即最新一集)然后立即停止并产生那个结果?任何建议将不胜感激。
def get_playable_podcast(soup):
"""
@param: parsed html page
"""
subjects = []
for content in soup.find_all('item'):
try:
link = content.find('enclosure')
link = link.get('url')
print "\n\nLink: ", link
title = content.find('title')
title = title.get_text()
desc = content.find('itunes:subtitle')
desc = desc.get_text()
thumbnail = content.find('itunes:image')
thumbnail = thumbnail.get('href')
except AttributeError:
continue
item = {
'url': link,
'title': title,
'desc': desc,
'thumbnail': thumbnail
}
subjects.append(item)
return subjects
def compile_playable_podcast(playable_podcast):
"""
@para: list containing dict of key/values pairs for playable podcasts
"""
items = []
for podcast in playable_podcast:
items.append({
'label': podcast['title'],
'thumbnail': podcast['thumbnail'],
'path': podcast['url'],
'info': podcast['desc'],
'is_playable': True,
})
return items
【问题讨论】:
-
如果您只想要第一个元素,请使用
soup.find()而不是soup.find_all()。
标签: python parsing beautifulsoup urllib2