python报纸模块-从文章中获取所有图像答案

【问题标题】：python newspaper module - get all the images from an articlepython报纸模块-从文章中获取所有图像
【发布时间】：2018-11-15 09:03:57
【问题描述】：

通过使用 python 的newspaper 模块，我可以通过以下方式从文章中获取顶部图像：

from newspaper import Article
first_article = Article(url="http://www.lemonde.fr/...", language='fr')
first_article.download()
first_article.parse()
print(first_article.top_image)

但我需要获取文章中的所有图片。他们的 github 文档说：'从 html 中提取所有图像' 是可能的。但我无法弄清楚这一点。而且我不想手动下载 html 文件并将其保存在硬盘驱动器中，然后将文件提供给模块并获取图像。

我可以通过什么方式做到这一点？

【问题讨论】：

newspaper.readthedocs.io/en/latest/#features what are you see all image extraction from html is features ，他们现在没有这个
@zimdero，你是什么意思？特征就是存在的东西。顶部图像提取也是一项功能，并在文档中进行了描述
我的意思是将来会，但是现在他们没有这个功能来获取所有图像
@zimdero，编辑了我的评论
也许他们实现了top_image功能但all_image不完整，我不知道，我也搜索了问题的响应，我没有找到任何东西，你可以试试@Bear Brown 代码示例也许会对您有所帮助

标签： python django web-scraping python-newspaper

【解决方案1】：

您可能已经解决了这个问题，但是您可以通过调用 article.images 来获取 Newspaper 的图片网址。

from newspaper import Article

article = Article(url="http://www.lemonde.fr/", language='fr')
article.download()
article.parse()
top_image = article.top_image
all_images = article.images
for image in all_images:
  print(image)
   
  https://img.lemde.fr/2020/09/22/0/3/4485/2990/220/146/30/0/a79897c_115736902-000-8pt8nc.jpg
  https://img.lemde.fr/2020/09/22/0/0/5315/3543/192/0/75/0/7b90c88_645792534-pns-3418491.jpg
  https://img.lemde.fr/2020/09/09/200/0/1500/999/180/0/95/0/d8099d2_51464-3185927.jpg
  https://img.lemde.fr/2020/09/22/0/4/4248/2832/664/442/60/0/557e6ee_5375150-01-06.jpg

【讨论】：