【发布时间】:2022-01-10 08:27:48
【问题描述】:
我有一个页面,我必须登录才能获取我想使用 BeautifulSoup 抓取的页面。我的代码目前看起来像
import requests
from bs4 import BeautifulSoup
from selenium import webdriver
driver = webdriver.Firefox();
//loginpage is the page where I have to login. It is just used as a placeholder for this question
driver.get("loginpage");
driver.find_element_by_id("username").send_keys("username");
driver.find_element_by_id("password").send_keys("password");
driver.find_element_by_xpath("//button[@onclick=\"return validateFields();\"]").click();
//contentpage is where I get the content to scrape from. It is also just used as a placeholder for this question.
driver.get("contentpage");
html = driver.page_source;
soup = BeautifulSoup(html, features="lxml");
status = soup.find_all("span");
for status in status:
print(status);
但我认为 HTML 是错误的页面,因为当我可以查看浏览器并看到它应该存在时,BeautifulSoup 正在返回 NoneType。
【问题讨论】:
标签: python selenium selenium-webdriver web-scraping beautifulsoup