(Python) 从用户浏览器中打开的网站获取 HTML答案

【问题标题】：(Python) Get HTML from website opened in user's browser(Python) 从用户浏览器中打开的网站获取 HTML
【发布时间】：2021-09-08 05:03:48
【问题描述】：

我想从常规的 Chrome 会话中获取数据，而不是 Selenium 会话，因为数据已经存在于那里，并且在 selenium 中重新创建相同的场景需要很长时间才能派上用场。有没有办法查看当前打开的标签页的 HTML？

【问题讨论】：

标签： python html google-chrome browser

【解决方案1】：

我建议为此使用urllib.request：

from urllib.request import urlopen

link = "https://stackoverflow.com/questions/68120200/python-get-html-       from-website-opened-in-users-browser"
openedpage = urlopen(link)
content = openedpage.read()
code = bytes.decode("utf-8")
print(code)

例如，这将给出此问题页面的代码。希望这是您想要实现的目标。如果您想提取实际数据而不是代码，您可以使用同一个库：

from urllib.request import urlopen

link = "https://stackoverflow.com/questions/68120200/python-get-html-from-website-opened-in-users-browser"
openedpage = urlopen(link)
content = openedpage.read()
code = content.decode("utf-8")
title = code.find("<title>")
title_start = title + len("<title>")
title_end = code.find("</title>")
full_title = code[title_start:title_end]
print(full_title)

基本上你想要获取代码的任何部分是收集标签的开始和结束索引，然后像示例中那样将它们组合在一起。

【讨论】：

出于安全考虑，我需要一个实际的浏览器才能打开它，而且 Selenium/Playwright 需要很长时间才能派上用场。