requests.get() 返回的 HTML 与我浏览器上的不同答案

【问题标题】：requests.get() returns different HTML than the one on my browserrequests.get() 返回的 HTML 与我浏览器上的不同
【发布时间】：2020-08-13 15:49:47
【问题描述】：

试图从this 网站获取链接。但注意到我从解析中获得的链接与浏览器上显示的链接不同。没有任何丢失的链接，因为浏览器和解析结果都显示了 14 个超链接（用于系列）。但是我的浏览器显示了一些我的“结果”没有的链接，而我的“结果”显示了一些我的浏览器没有的链接。

例如，我的结果显示了一个类似的链接

“https://4anime.to/anime/one-piece-nematsu-tokubetsu-kikaku-mugiwara-no-luffy-oyabun-torimonochou”

但是当我在浏览器中搜索“torimonochou”这个词时，我找不到任何匹配项。

搜索页面源中的链接（右键单击页面并选择查看页面源）所以我不应该错过任何东西。还在 requests.get() 中传递了我的浏览器标题，所以我应该得到相同的 HTML 代码。

代码：

head = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:79.0) Gecko/20100101 Firefox/79.0'}

searchResObj = requests.get("https://4anime.to/?s=one+piece", headers = head)
soupObj = bs4.BeautifulSoup(searchResObj.text, features="html.parser")

尝试了各种不同的方法来解析链接。这只是一个简化版本，它获取页面中的所有链接，所以我没有遗漏任何链接。

all_a = soupObj.select("a")

for links in all_a:
    print(links.get("href"))

还查看了我的编译器中的 HTML 代码。超链接确实与我浏览器中显示的不同

print(searchResObj.text)

那么可能是什么原因造成的呢？

【问题讨论】：

可能是。但尝试查看页面分数代码并查看具有链接的部分。很确定没有任何JS。

标签： python html beautifulsoup python-requests

【解决方案1】：

运行此脚本将打印 14 个链接，这些链接也会显示在浏览器中（也许你有验证码页面？）：

import requests
from bs4 import BeautifulSoup


searchResObj = requests.get("https://4anime.to/?s=one+piece")
soupObj = BeautifulSoup(searchResObj.text, features="html.parser")

for a in soupObj.select('#headerDIV_95 > a'):
    print(a['href'])

打印：

https://4anime.to/anime/one-piece-nenmatsu-tokubetsu-kikaku-mugiwara-no-luffy-oyabun-torimonochou
https://4anime.to/anime/one-piece-straw-hat-theater
https://4anime.to/anime/one-piece-movie-14-stampede
https://4anime.to/anime/one-piece-yume-no-soccer-ou
https://4anime.to/anime/one-piece-mezase-kaizoku-yakyuu-ou
https://4anime.to/anime/one-piece-umi-no-heso-no-daibouken-hen
https://4anime.to/anime/one-piece-film-gold
https://4anime.to/anime/one-piece-heart-of-gold
https://4anime.to/anime/one-piece-episode-of-sorajima
https://4anime.to/anime/one-piece-episode-of-sabo
https://4anime.to/anime/one-piece-episode-of-nami
https://4anime.to/anime/one-piece-episode-of-merry
https://4anime.to/anime/one-piece-episode-of-luffy
https://4anime.to/anime/one-piece-episode-of-east-blue

编辑：“查看源代码”的屏幕截图：

【讨论】：

是的，就像我说的没有任何缺失的链接。问题是我在运行代码时得到不同的链接。尝试在浏览器中查看页面源代码并查找您打印的第一个链接https://4anime.to/anime/one-piece-nenmatsu-tokubetsu-kikaku-mugiwara-no-luffy-oyabun-torimonochou。您找不到匹配项，但您编写的代码确实打印了链接
@RedwanHossainArnob 但我在源代码和脚本中也看到了链接。
这不可能，我仔细检查了一遍。源代码？我假设您是从浏览器查看的？我仍然找不到您在浏览器中打印的第一个链接。尝试访问页面而不是源代码，您会看到没有“one-piece-nematsu-tokubetsu-kikaku-mugiwara-no-luffy-oyabun-torimonochou”
@RedwanHossainArnob 请参阅我的答案中的屏幕截图。
这很奇怪。再次尝试使用 VPN 和 adblock 仍然找不到它。因此，尝试查看 933，您在源代码中找到了链接。而不是你得到的链接，我得到了 4anime.to/anime/one-piece">