Python3没有在网页上获得全文答案

【问题标题】：Python3 Not getting full text on webpagePython3没有在网页上获得全文
【发布时间】：2020-09-06 03:05:56
【问题描述】：

我有一个网页，我想从中检索电子邮件：

url = 'https://www.westminster.ac.uk/about-us/our-people/directory/ramachandran-natasha-1'

我曾尝试将 BeautifoulSoup 与 requests 和 urllib 一起使用，但都不起作用，因为我 print(page_source) 时不包含电子邮件。

page = requests.get(url, headers={'User-Agent': 'Mozilla/5.0'})
page_source = page.text

和

page = urllib.request.Request(url, headers={'User-Agent': 'Mozilla/5.0'})
infile = urllib.request.urlopen(page).read()
page_source = infile.decode('ISO-8859-1')

我尝试过使用和不使用标题。另外，如果我使用带有driver.get(url) 的硒来执行此操作，那么它可以工作。但是我不能使用 selenium，因为它太慢了。

我看过其他几个类似的帖子，他们提出了上述解决方案，但它们对我不起作用。

有没有一种快速的方法可以让我在该页面上检索电子邮件？

【问题讨论】：

您要抓取的电子邮件是动态的。尝试使用硒。
我在问题中说我不想使用硒。谢谢硒。

标签： python-3.x web-scraping

【解决方案1】：

from bs4 import BeautifulSoup
import requests
url="https://www.westminster.ac.uk/about-us/our-people/directory/ramachandran-natasha-1"
page_data=requests.get(url)
soup=BeautifulSoup(page_data.content,"html.parser")

email_id=[]

for job_tag in soup.find_all("div",class_="masthead-profile__result-set"):
    email=job_tag.find("div",class_="masthead-profile__result-value--email email")
    
    email_id.append(email)

您会看到，但如果您只打印上述问题中的 job_tag，您可以清楚地看到该电子邮件受到网站的保护

【讨论】：

那么还有其他方法可以得到吗？
我不确定......但我认为没有一些网站会保护一些数据，以免人们刮掉它