【发布时间】:2021-11-03 06:48:06
【问题描述】:
我正在尝试从https://www.realestate.com.au/ 中提取数据 首先,我根据我正在寻找的属性类型创建我的 url,然后我使用 selenium webdriver 打开 url,但页面是空白的! 知道为什么会这样吗?是不是因为这个网站不提供网页抓取权限?有什么方法可以抓取这个网站吗?
这是我的代码:
from selenium import webdriver
from bs4 import BeautifulSoup
import time
PostCode = "2153"
propertyType = "house"
minBedrooms = "3"
maxBedrooms = "4"
page = "1"
url = "https://www.realestate.com.au/sold/property-{p}-with-{mib}-bedrooms-in-{po}/list-{pa}?maxBeds={mab}&includeSurrounding=false".format(p = propertyType, mib = minBedrooms, po = PostCode, pa = page, mab = maxBedrooms)
print(url)
# url should be "https://www.realestate.com.au/sold/property-house-with-3-bedrooms-in-2153/list-1?maxBeds=4&includeSurrounding=false"
driver = webdriver.Edge("./msedgedriver.exe") # edit the address to where your driver is located
driver.get(url)
time.sleep(3)
src = driver.page_source
soup = BeautifulSoup(src, 'html.parser')
print(soup)
【问题讨论】:
-
driver.get(url)这不会在 UI 中显示任何数据?您也尝试过使用 chrome 驱动程序吗? -
查看robots.txt,他们禁止自动访问他们的网站
-
感谢@cruisepandey 的回复。我认为不同的驱动程序无法解决此问题。正如 Rustam 指出的那样,他们严格禁止任何自动访问:(
标签: python selenium web-scraping