为什么网站的源代码与在浏览器中查看时的代码不同？（用于网页抓取）答案

【问题标题】：Why is the source code of a website different from the code when inspected in browser? (For webscraping)为什么网站的源代码与在浏览器中查看时的代码不同？（用于网页抓取）
【发布时间】：2020-01-20 00:34:27
【问题描述】：

我正在做我的第一个编程项目，所以请原谅任何不正确的术语。

我的目标：我正在尝试从我当地图书馆的网站上进行网络抓取。最终目标是能够自动更新网站上的书籍。

进展：我已经成功地使用 Python、Selenium 和 Webdriver 登录图书馆的网站并进入“结帐”页面查看我借出的物品。然后，我使用 Beautiful Soup 提取网站“结帐”页面的 HTML 代码。

问题： 当我检查“Checkouts”网站（右键单击并在 Chrome 浏览器中选择“检查”）时，看起来 HTML 源代码与查看 HTML 源代码时的代码不同（右键单击并选择“查看源代码”） Chrome 浏览器中的代码）。

查看源代码时的 HTML 代码与我的 Python 代码提取的内容一致，但缺少我想要抓取的所有信息。但是，检查站点时的 HTML 代码确实包含我想要抓取的信息。

问题：

为什么当我在 Chrome 浏览器中“检查”站点与“查看源代码”时 HTML 代码不同？（这可能是“缩小”吗？）
如何提取包含我要抓取的信息的 HTML 代码？

我在下面包含了我的代码。

from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from bs4 import BeautifulSoup
import requests

driver = webdriver.Chrome()
driver.get('https://www.torontopubliclibrary.ca/signin')

# Login details
username = 'abcdefghi'
password = 'abcd'

# Type Username
submitUsername = driver.find_element_by_id('userId')
submitUsername.send_keys(username)

# Type Password
submitPassword = driver.find_element_by_id('password')
submitPassword.send_keys(password)

# Press Enter
enter = driver.find_element_by_xpath('//*[@id="form_signin"]/div[2]/input[3]')
enter.click()

# Open Checkouts page
driver.get('https://account.torontopubliclibrary.ca/checkouts')

driver = requests.get('https://account.torontopubliclibrary.ca/checkouts').text
soup = BeautifulSoup(driver, 'html.parser')

【问题讨论】：

这能回答你的问题吗？ How can a scraped HTML be different from the source code?
我想到的 2 件事是 1) 不等待所有内容加载，以及 2) 使用非 html5 解析器

标签： html python-3.x selenium-webdriver web-scraping beautifulsoup

【解决方案1】：

关于您的第一个问题：“为什么我在 Chrome 浏览器中‘检查’网站与‘查看源代码’时的 HTML 代码不同？（这可能是‘缩小’吗？）”

“查看源代码”显示当您请求页面时从服务器收到的 HTML 文本响应。

在收到这个初始响应后，在现代页面中会执行大量 javascript，这会更改从服务器接收到的初始 html。当您“检查”页面时，通过 javascript 所做的所有更改都会反映在您所看到的内容中，并同步到您作为用户在浏览器中看到的内容，而“查看源代码”总是返回来自服务器的初始响应，之后没有发生任何变化。

【讨论】：