【发布时间】:2020-12-24 21:39:10
【问题描述】:
我正在尝试获取此页面上帖子的链接,但它们显然是通过单击每个帖子图像生成的。我在 Python 3.8 中使用 Selenium 和 beautifulsoup4。 知道如何在 selenium 继续下一页时获取链接吗?
点击图片后,它会打开一个新标签页,其中包含以下类型的缩短网址:https://www.goplaceit.com/propiedad/6198212
将我发送到 url 类型:
我的代码:
from bs4 import BeautifulSoup
from selenium import webdriver
import time
from selenium.webdriver.common.action_chains import ActionChains
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.chrome.options import Options
import winsound
from timeit import default_timer as timer
from selenium.webdriver.common.keys import Keys
start = timer()
PROXY = "PROXY" # IP:PORT or HOST:PORT
path_to_extension = r"extension"
options = Options()
#options.add_argument("--incognito")
options.add_argument('load-extension=' + path_to_extension)
#options.add_argument('--disable-java')
options.headless = False
prefs = {"profile.default_content_setting_values.notifications" : 2}
prefs2 = {"profile.managed_default_content_settings.images": 2}
prefs.update(prefs2)
prefs3 = {"profile.default_content_settings.cookies": 2}
prefs.update(prefs3)
options.add_experimental_option("prefs",prefs)
options.add_argument("--start-maximized")
options.add_argument('--proxy-server=%s' % PROXY)
driver = webdriver.Chrome('chromedriver.exe', options=options)
driver.get('https://www.goplaceit.com/cl/')
WebDriverWait(driver, 30).until(EC.element_to_be_clickable((By.XPATH, '//*[@id="root"]/nav/div/div[2]/div[1]/button'))).click()
correo = driver.find_element(By.XPATH, '//*[@id="email"]')
correo.send_keys("Mail")
contraseña = driver.find_element(By.XPATH, '//*[@id="password"]')
contraseña.send_keys("password")
contraseña.send_keys(Keys.ENTER)
time.sleep(7)
elem.driver.find_element(By.XPATH, '//*[@id="gpi-main-landing-search-input"]/div/input')
elem.click()
elem.send_keys("keywords")
WebDriverWait(driver, 30).until(EC.element_to_be_clickable((By.XPATH, '//*[@id="gpi-main-landing-search-input"]/div/div[1]/ul/li[1]/a/div/div[1]'))).click()
buscador.send_keys(Keys.ENTER)
WebDriverWait(driver, 30).until(EC.element_to_be_clickable((By.XPATH, '//*[@id="root"]/div/div/div[1]/div[2]/div/div[1]/div/div[1]/button'))).click()
WebDriverWait(driver, 30).until(EC.element_to_be_clickable((By.XPATH, '//*[@id="custom-checkbox"]'))).click()
page_number = 0
max_page_number = 30
while page_number<=max_page_number:
WebDriverWait(driver, 30).until(EC.element_to_be_clickable((By.XPATH, '//button[contains(text(),"paginator-btn-right")]'))).click()
【问题讨论】:
标签: python-3.x selenium-webdriver web-scraping