从新闻网站上抓取新闻标题答案

【问题标题】：Scraping the news titles from news websites从新闻网站上抓取新闻标题
【发布时间】：2020-11-20 10:51:47
【问题描述】：

我一直在尝试从新闻网站上抓取新闻标题。为此，我遇到了两个 python 库，即报纸和 beautifulsoup4。使用美丽的汤库，我已经能够从一个特定的新闻网站获得所有指向新闻文章的链接。从下面的代码中，我能够从单个链接中提取新闻文章的标题。

from newspaper import Article
url= "https://www.ndtv.com/india-news/tamil-nadu-government-reverses-decision-to-reopen-schools-from-november-16-for-classes-9-12-news-agency-pti-2324199"
article=Article(url)
article.download()
article.parse()
print(article.title)

我想结合两个库的代码，即报纸和beautifulsoup4，这样我作为beautifulsoup库的输出获得的所有链接都应该放在报纸库的url命令中，我得到所有链接的标题。下面是 beautfulsoup 的代码，我可以从中提取所有新闻文章的链接。

from bs4 import BeautifulSoup
from bs4.dammit import EncodingDetector
import requests

parser = 'html.parser'  # or 'lxml' (preferred) or 'html5lib', if installed
resp = requests.get("https://www.ndtv.com/coronavirus?pfrom=home-mainnavgation")
http_encoding = resp.encoding if 'charset' in resp.headers.get('content-type', '').lower() else None
html_encoding = EncodingDetector.find_declared_encoding(resp.content, is_html=True)
encoding = html_encoding or http_encoding
soup = BeautifulSoup(resp.content, parser, from_encoding=encoding)

for link in soup.find_all('a', href=True):
    print(link['href'])

【问题讨论】：

那么，你有什么问题/错误？
预期的输出是什么？
我最近开始整理一份公开分享的详细 Newspaper3k 使用文档。该文档可在此处获得：github.com/johnbumgarner/newspaper3_usage_overview。使用报纸时，您可能会发现它很有用。
@Lifeiscomplex 你好，你能帮我一下这个库吗？我不知道我做错了什么：*.com/questions/65110807/…

标签： python web-scraping beautifulsoup newspaper3k

【解决方案1】：

你的意思是这样的吗？

links = []
for link in soup.find_all('a', href=True):
    links.append(link['href'])

for link in links:
    article=Article(link)
    article.download()
    article.parse()
    print(article.title)

【讨论】：