如何扫描网页并获取图像和 youtube 嵌入？

【问题标题】：How to scan a webpage and get images and youtube embeds?如何扫描网页并获取图像和 youtube 嵌入？
【发布时间】：2010-09-21 06:34:28
【问题描述】：

我正在构建一个网络应用程序，我需要在其中获取给定 URL 上嵌入的所有图像和任何 Flash 视频（例如 youtube）。我正在使用 Python。

我用谷歌搜索过，但没有找到任何关于这个的好信息（可能是因为我不知道这被称为搜索什么），有没有人有这方面的经验并且知道如何做到这一点？

如果有可用的代码示例，我很乐意查看。

谢谢！

【问题讨论】：

标签： python web-applications screen-scraping

【解决方案1】：

BeautifulSoup 是一个很棒的屏幕抓取库。使用 urllib2 获取页面，并使用 BeautifulSoup 将其解析。这是他们文档中的代码示例：

import urllib2
from BeautifulSoup import BeautifulSoup

page = urllib2.urlopen("http://www.icc-ccs.org/prc/piracyreport.php")
soup = BeautifulSoup(page)
for incident in soup('td', width="90%"):
    where, linebreak, what = incident.contents[:3]
    print where.strip()
    print what.strip()
    print

【讨论】：

我是新人，你将如何筛选页面并获取视频网址？