是否可以从 Python 中的 Inspect Element 中抓取 HTML？答案

【问题标题】：Is it possible to scrape HTML from Inspect Element in Python?是否可以从 Python 中的 Inspect Element 中抓取 HTML？
【发布时间】：2021-03-01 04:44:13
【问题描述】：

我正在尝试抓取一个试图阻止抓取的网站。通过 Chrome、requests 或 requests_html 查看源代码会导致无法显示正确的源代码。

这是一个例子：

from requests_html import HTMLSession

session = HTMLSession()

content = session.get('website')

content.html.render()

print(content.html.html)

它给出了这个页面：您的浏览器似乎禁用或不支持 JavaScript。

即使启用了 Javascript。在实际浏览器上也会发生同样的事情。

但是，在我的实际浏览器上，当我去检查元素时，我可以看到源代码就好了。有没有办法从检查元素中提取 HTML 源代码？

谢谢！

【问题讨论】：

当然，实际上，bustabit.com/play 这是一个赌博网站，出于好奇，我想收集大约 1 个月的数字，看看几率是多少。

标签： python screen-scraping

【解决方案1】：

您面临的问题是它是由前端的 Javascript 呈现的页面。在这种情况下，您需要启用 javasacript 的浏览器引擎，然后您就可以轻松阅读 HTML 源代码。

这是我将如何做的工作代码（使用硒）：

from selenium import webdriver
from selenium.webdriver.common.keys import Keys  
from selenium.webdriver.chrome.options import Options  

chrome_options = Options()  
driver = webdriver.Chrome(chrome_options=chrome_options) 

# Ensure that the full URL path is given
URL = 'https://proper_url'

# The following step will launch a browser.
driver.get(URL)

# Now you can easily read the source HTML
HTML = driver.page_source

您必须弄清楚安装和设置 Selenium 和 webdriver 的详细信息。 Here's 一个很好的起点。

【讨论】：

您好，这不起作用。通过我的实际浏览器查看源代码也不起作用，只能检查元素。