【发布时间】:2021-02-14 05:31:07
【问题描述】:
首先我浏览了谷歌,但它们都不起作用。 我试图从新闻网页获取所有链接,所以我列出了下面的元素,但我唯一的问题是获取链接。
<section class="featured-category"><article class="post-box">
<div class="post-thumbnail video-play">
<figure class="image-wrapper"><a href="https://news.abs-cbn.com/ancx/culture/music/10/30/20/the-smokey-mountainthirty-years-after">
<img data-src="https://sa.kapamilya.com/absnews/abscbnnews/media/ancx/culture/2020/84/1sm_medium_thumbnail.jpg" width="188" height="125" alt="The Smokey Mountain—thirty years after" class="mp4-animations lazy img-responsive loaded" src="https://sa.kapamilya.com/absnews/abscbnnews/media/ancx/culture/2020/84/1sm_medium_thumbnail.jpg" data-was-processed="true">
</a></figure>
<div class="item-category bottom-left">
<div class="label-text">ANCX</div>
</div>
</div>
<div class="post-content">
<h2 class="post-title"><a href="https://news.abs-cbn.com/ancx/culture/music/10/30/20/the-smokey-mountainthirty-years-after">The Smokey Mountain—thirty years after</a></h2>
</div>
</article>
<article class="post-box">
<div class="post-thumbnail video-play">
<figure class="image-wrapper">
<a href="/news/11/01/20/typhoon-rolly-batters-southern-luzon">
<img data-src="https://sa.kapamilya.com/absnews/abscbnnews/media/2020/news/11/01/20201101-south-luzon-rolly-lucenapolice_medium_thumbnail.jpg" width="188" height="125" alt="Typhoon Rolly batters Southern Luzon" class="mp4-animations lazy img-responsive loaded" src="https://sa.kapamilya.com/absnews/abscbnnews/media/2020/news/11/01/20201101-south-luzon-rolly-lucenapolice_medium_thumbnail.jpg" data-was-processed="true">
</a>
</figure>
<div class="item-category bottom-left">
<div class="label-text">News</div>
</div>
</div>
<div class="post-content">
<h2 class="post-title"><a href="news/11/01/20/typhoon-rolly-batters-southern-luzon">Typhoon Rolly batters Southern Luzon</a></h2>
</div>
</article>
<article class="post-box">
<div class="post-thumbnail video-play">
<figure class="image-wrapper">
<a href="/business/11/01/20/typhoon-rolly-knocks-out-power-in-bicol-parts-of-calabarzon">
<img data-src="https://sa.kapamilya.com/absnews/abscbnnews/media/2020/news/11/01/20201101-typhoon-rolly-cagsawa-amiraflor_medium_thumbnail.jpg" width="188" height="125" alt="Typhoon Rolly knocks out power in Bicol, parts of Calabarzon" class="mp4-animations lazy img-responsive loaded" src="https://sa.kapamilya.com/absnews/abscbnnews/media/2020/news/11/01/20201101-typhoon-rolly-cagsawa-amiraflor_medium_thumbnail.jpg" data-was-processed="true">
</a>
</figure>
<div class="item-category bottom-left">
<div class="label-text">Business</div>
</div>
</div>
<div class="post-content">
<h2 class="post-title"><a href="business/11/01/20/typhoon-rolly-knocks-out-power-in-bicol-parts-of-calabarzon">Typhoon Rolly knocks out power in Bicol, parts of Calabarzon</a></h2>
</div>
</article>
<article class="post-box">
<div class="post-thumbnail video-play">
<figure class="image-wrapper">
<a href="/news/11/01/20/ph-virus-tally-now-at-383113-as-2396-new-cases-confirmed">
<img data-src="https://sa.kapamilya.com/absnews/abscbnnews/media/2020/news/07/11/coronavirus-covid-generic_medium_thumbnail.jpg" width="188" height="125" alt="PH virus tally now at 383,113 as 2,396 new cases confirmed" class="mp4-animations lazy img-responsive loaded" src="https://sa.kapamilya.com/absnews/abscbnnews/media/2020/news/07/11/coronavirus-covid-generic_medium_thumbnail.jpg" data-was-processed="true">
</a>
</figure>
<div class="item-category bottom-left">
<div class="label-text">News</div>
</div>
</div>
<div class="post-content">
<h2 class="post-title"><a href="news/11/01/20/ph-virus-tally-now-at-383113-as-2396-new-cases-confirmed">PH virus tally now at 383,113 as 2,396 new cases confirmed</a></h2>
</div>
</article>
<article class="post-box">
<div class="post-thumbnail video-play">
<figure class="image-wrapper">
<a href="/sports/11/01/20/ahead-of-resumption-of-games-pba-players-test-negative-for-covid-19">
<img data-src="https://sa.kapamilya.com/absnews/abscbnnews/media/2020/news/10/11/pba_medium_thumbnail.jpg" width="188" height="125" alt="Ahead of resumption of games, PBA players test negative for COVID-19" class="mp4-animations lazy img-responsive loaded" src="https://sa.kapamilya.com/absnews/abscbnnews/media/2020/news/10/11/pba_medium_thumbnail.jpg" data-was-processed="true">
</a>
</figure>
<div class="item-category bottom-left">
<div class="label-text">Sports</div>
</div>
</div>
<div class="post-content">
<h2 class="post-title"><a href="sports/11/01/20/ahead-of-resumption-of-games-pba-players-test-negative-for-covid-19">Ahead of resumption of games, PBA players test negative for COVID-19</a></h2>
</div>
</article>
<article class="post-box">
<div class="post-thumbnail video-play">
<figure class="image-wrapper">
<a href="/sports/11/01/20/sportsman-turned-spy-why-sean-connery-chose-james-bond-over-manchester-united">
<img data-src="https://sa.kapamilya.com/absnews/abscbnnews/media/2020/afp/11/01/20201101-seanconnery-ronaldinho-afp_medium_thumbnail.jpg" width="188" height="125" alt="Sportsman turned ‘spy’: Why Sean Connery chose James Bond over Manchester United" class="mp4-animations lazy img-responsive loaded" src="https://sa.kapamilya.com/absnews/abscbnnews/media/2020/afp/11/01/20201101-seanconnery-ronaldinho-afp_medium_thumbnail.jpg" data-was-processed="true">
</a>
</figure>
<div class="item-category bottom-left">
<div class="label-text">Sports</div>
</div>
</div>
<div class="post-content">
<h2 class="post-title"><a href="sports/11/01/20/sportsman-turned-spy-why-sean-connery-chose-james-bond-over-manchester-united">Sportsman turned ‘spy’: Why Sean Connery chose James Bond over Manchester United</a></h2>
</div>
</article>
</section>
我尝试过的示例
content = soup.find('div', {'class' : "post-content"})
article = ''
for letter in content.findAll("a"):
print(letter.text)
请帮忙,我真的不知道如何获取链接,也就是“href”值,因为我今天刚刚尝试使用 BeautifulSoup
【问题讨论】:
标签: python html web-scraping beautifulsoup