如何使用我的 Python 代码从网站中检索位于表中的图像？答案

【问题标题】：How do I retrieve images located in a table from a website with my Python Codes?如何使用我的 Python 代码从网站中检索位于表中的图像？
【发布时间】：2017-08-21 07:24:08
【问题描述】：

我的 Jupyter 笔记本中运行了以下 Python 代码：

from lxml.html import parse
tree = parse('http://www.imdb.com/chart/top')
movies = tree.findall('.//table[@class="chart full-width"]//td[@class="titleColumn"]//a')

movies[0].text_content()

上面的代码给了我以下输出：

'The Shawshank Redemption'

基本上，它是该网页上名为“titleColumn”的列的第一行的内容。在同一张表中，还有一个名为“posterColumn”的列，其中包含一个缩略图。

现在我希望我的代码检索这些图像，并且输出也显示该图像。

我需要使用另一个包来实现这一点吗？图片可以在 Jupyter Notebook 中显示吗？

【问题讨论】：

有一个非常相似的问题using bautifulsoup。
谢谢。我错过了那个。我会看看它从那里去哪里。

标签： python image pandas jupyter-notebook

【解决方案1】：

要获取相关图像，您需要获取posterColumn。从中您可以提取 img src 条目并拉取 jpg 图像。然后可以根据电影标题保存文件，注意删除任何无效的文件名字符，例如:：

from lxml.html import parse
import requests
import string

valid_chars = "-_.() " + string.ascii_letters + string.digits
tree = parse('http://www.imdb.com/chart/top')
movies = tree.findall('.//table[@class="chart full-width"]//td[@class="titleColumn"]//a')
posters = tree.findall('.//table[@class="chart full-width"]//td[@class="posterColumn"]//a')

for p, m in zip(posters, movies):
    for element, attribute, link, pos in p.iterlinks():
        if attribute == 'src':
            print "{:50} {}".format(m.text_content(), link)
            poster_jpg = requests.get(link, stream=True)
            valid_filename = ''.join(c for c in m.text_content() if c in valid_chars)

            with open('{}.jpg'.format(valid_filename), 'wb') as f_jpg:
                for chunk in poster_jpg:
                    f_jpg.write(chunk)

所以目前您会看到以下内容：

The Shawshank Redemption                           https://images-na.ssl-images-amazon.com/images/M/MV5BODU4MjU4NjIwNl5BMl5BanBnXkFtZTgwMDU2MjEyMDE@._V1_UY67_CR0,0,45,67_AL_.jpg
The Godfather                                      https://images-na.ssl-images-amazon.com/images/M/MV5BZTRmNjQ1ZDYtNDgzMy00OGE0LWE4N2YtNTkzNWQ5ZDhlNGJmL2ltYWdlL2ltYWdlXkEyXkFqcGdeQXVyNjU0OTQ0OTY@._V1_UY67_CR1,0,45,67_AL_.jpg
The Godfather: Part II                             https://images-na.ssl-images-amazon.com/images/M/MV5BMjZiNzIxNTQtNDc5Zi00YWY1LThkMTctMDgzYjY4YjI1YmQyL2ltYWdlL2ltYWdlXkEyXkFqcGdeQXVyNjU0OTQ0OTY@._V1_UY67_CR1,0,45,67_AL_.jpg

【讨论】：

太棒了！非常感谢您的宝贵时间！