从 iframe 中提取抓取的 Web 内容答案

【问题标题】：Extracting Scraped Web Content from iframe从 iframe 中提取抓取的 Web 内容
【发布时间】：2021-11-05 03:58:26
【问题描述】：

试图在https://coronavirus.health.ny.gov/zip-code-vaccination-data刮桌子

我看过 Python BeautifulSoup - Scrape Web Content Inside Iframes 已经走到这一步了，但我不知道如何从 soup 中提取信息。

非常感谢任何帮助。

import requests
from bs4 import BeautifulSoup

s = requests.Session()
r = s.get("https://coronavirus.health.ny.gov/zip-code-vaccination-data")

soup = BeautifulSoup(r.content, "html.parser")
iframe_src = '//static-assets.ny.gov/load_global_footer/ajax?iframe=true'

r = s.get(f"https:{iframe_src}")

soup = BeautifulSoup(r.content, "html.parser")

【问题讨论】：

标签： python html web-scraping iframe beautifulsoup

【解决方案1】：

您尝试抓取的网站使用 javascript 动态生成 <iframe>，因此您需要一些东西来自动化浏览器操作，如 selenium、puppeteer 或将 <iframe> url 分配给变量，因为它似乎在不久的将来不会改变。这是您的<iframe> 的网址：

https://public.tableau.com/views/Vaccination_Rate_Public/NYSVaccinationRate?:embed=y&:showVizHome=n&:tabs=n&:toolbar=n&:device=desktop&showShareOptions=false&:apiID=host0#navType=1&navSrc=Parse

【讨论】：