【发布时间】:2014-09-04 14:29:42
【问题描述】:
我试图获取a标签内的字符串信息,但a标签在h1标签内。
<h1 class="branded-page-header-title">
<span class="qualified-channel-title ellipsized"><span class="qualified-channel-title-wrapper"><span dir="ltr" class="qualified-channel-title-text" ><a dir="ltr" href="/user/viralvideoslmao" class="spf-link branded-page-header-title-link yt-uix-sessionlink" title="ViralVideos" data-sessionlink="ei=lXIIVM-_CvKQigahpIHgDA" >ViralVideos</a></span></span></span>
</h1>
我想要在这种情况下为“ViralVideos”的信息 a.t.m 我有这个:
import requests
from bs4 import BeautifulSoup
def get_yt_links():
url = "https://youtube.com"
source_code = requests.get(url)
plain_text = source_code.text
soup = BeautifulSoup(plain_text)
for code in soup.findAll('a'):
href = "http://youtube.com" + code.get('href')
if "channel/U" in href:
get_user(href)
print(href)
def get_user(url):
source_code = requests.get(url)
plain_text = source_code.text
soup = BeautifulSoup(plain_text)
for user in soup.findAll('h1', {'class': 'branded-page-header-title'}).a:
print(user.string)
提前致谢
【问题讨论】:
标签: html python-3.x tags beautifulsoup web-crawler