【问题标题】:Page won't load after .click() even when it display on the browser.click() 后页面不会加载,即使它显示在浏览器上
【发布时间】:2020-01-28 05:45:32
【问题描述】:

我正在尝试抓取此页面"https://www.seloger.com/list_beta.htm?tri=initial&enterprise=0&idtypebien=2,1&idtt=2,5&naturebien=1,2,4&cp=75"

但是当我在搜索的第一个元素上使用.click() 时,页面已正确加载到浏览器中,但我没有使用driver.find_element 方法获得body 及其所有子元素,而获得了新加载的页面让我毫无问题地找到它。 driver.current_url 给我第一页,就像.click() 没有加载任何东西,而渲染在浏览器中成功加载。

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.expected_conditions import visibility_of_element_located
from selenium.webdriver.support.ui import WebDriverWait
import pyautogui
import time


def cssconvert(tag):
    return '.'+tag.replace(' ', '.')


binary = r'C:\Program Files\Mozilla Firefox\firefox.exe'
options = Options()
options.set_headless(headless=False)
options.binary = binary
cap = DesiredCapabilities().FIREFOX
cap["marionette"] = True #optional
driver = webdriver.Firefox(firefox_options=options, capabilities=cap, executable_path="C:\\Users\\chrys\\Desktop\\DataScientist\\Selenium\\geckodriver_true\\geckodriver.exe")
driver.get("https://www.seloger.com/list_beta.htm?tri=initial&enterprise=0&idtypebien=2,1&idtt=2,5&naturebien=1,2,4&cp=75")

#time.sleep(2)
select = 'block__ShadowedBlock-sc-10w6hsj-0 ListContent__SmartClassifiedExtended-sc-1viyr2k-2 iddbNe classified__ClassifiedContainer-sc-1wmlctl-0 haLWMI Card__CardContainer-sc-7insep-7 jZkbME'
driver.find_element_by_css_selector(cssconvert(select)).click()
driver.find_element_by_id('js-descriptifBien')
    241             raise exception_class(message, screen, stacktrace, alert_text)
--> 242         raise exception_class(message, screen, stacktrace)
    243 
    244     def _value_or_default(self, obj, key, default):

NoSuchElementException: Message: Unable to locate element: [id="js-descriptifBien"]

如果现在我复制过去的网址:

driver.get('https://www.seloger.com/annonces/viagers/appartement/paris-11eme-75/belleville-saint-maur/145504325.htm?projects=2,5&types=2,1&natures=1,2,4&places=[{cp:75}]&qsVersion=1.0&bd=ListToDetail')
driver.find_element_by_id('js-descriptifBien').text

它正在工作。 由于我的目的是从结果研究中抓取所有元素,我会知道如何处理它。

【问题讨论】:

    标签: javascript python selenium selenium-webdriver css-selectors


    【解决方案1】:

    您可以阅读选择器 herehere

    问题是当您单击列表项时,它会在新选项卡中打开它。在采取任何行动之前,您必须切换到它。下面是一些如何在窗口之间导航的示例:

    current_window = driver.current_window_handle
    wait.until(EC.presence_of_element_located((By.CSS_SELECTOR, ".iddbNe"))).click()
    wait.until(EC.new_window_is_opened)
    driver.switch_to.window(driver.window_handles[1])
    

    使用 Selenium,解决方案可以收集列表中的所有链接,然后导航到它们。为此,您可以使用 .iddbNe a[name=classified-link] css 选择器。从所有必需页面收集所有链接后,您可以导航到它们并收集数据。
    以下是如何为第一页执行此操作的示例:

    from selenium import webdriver
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.support import expected_conditions as EC
    
    # ...
    
    wait = WebDriverWait(driver, 20)
    
    # you can put loop to go throw all pages you need and add to the list
    links = wait.until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, ".iddbNe a[name=classified-link]")))
    for link in links:
        driver.get(link)
        # get you data
    

    更好的解决方案是使用。下面的代码是一个简单的例子,你可以得到 100 个 json 格式的结果。同时,您可以找到结果总数并使用它来循环和收集您需要的所有信息:

    import requests
    
    headers = {
        'sec-fetch-mode': 'cors',
        'origin': 'https://www.seloger.com',
        'accept-encoding': 'gzip, deflate, br',
        'accept-language': 'ru,en-US;q=0.9,en;q=0.8,tr;q=0.7',
        'pragma': 'no-cache',
        'user-agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_6) AppleWebKit/537.36 (KHTML, like Gecko) '
                      'Chrome/77.0.3865.90 Safari/537.36',
        'content-type': 'application/json',
        'accept': 'application/json',
        'cache-control': 'no-cache',
        'authority': 'www.seloger.com',
        'referer': 'https://www.seloger.com/list_beta.htm?projects=2%2C5&types=2%2C1&natures=1%2C2%2C4&places=%5B%7Bcp'
                   '%3A75%7D%5D&qsVersion=1.0&LISTING-LISTpg=2',
        'sec-fetch-site': 'same-origin',
        'dnt': '1',
    }
    
    params = (
        ('from', '0'),
        ('size', '100'),
        ('isSeo', 'false'),
    )
    
    data = '{"idPublication":null,"projects":[2,5],"types":[2,1],"natures":[1,2,4],"places":[{"label":"Paris",' \
           '"cities":null,"districts":null,"countries":null,"divisions":null,"subDivisions":["75"],"fakeCities":null}],' \
           '"searchAreas":null,"isochronePoints":null,"proximities":null,"withGeoloc":null,"price":null,' \
           '"groundSurface":null,"surface":null,"bedrooms":[],"rooms":[],"bedroom":null,"room":null,"sort":null,' \
           '"floor":null,"lastFloor":null,"hearth":null,"guardian":null,"view":null,"balcony":null,"pool":null,' \
           '"lift":null,"terrace":null,"cellar":null,"south":null,"parking":null,"box":null,"parquet":null,"locker":null,' \
           '"furnished":null,"disabledAccess":null,"alarm":null,"toilet":null,"bathtub":null,"shower":null,"hall":null,' \
           '"livingRoom":null,"diningRoom":null,"kitchen":null,"heating":null,"unobscured":null,"picture":null,' \
           '"exclusiveness":null,"priceChange":null,"privateSeller":null,"video":null,"vv":null,"enterprise":null,' \
           '"garden":null,"basement":null,"groundFloor":null,"houseboat":null} '
    
    response = requests.post('https://www.seloger.com/list/api/externaldata', headers=headers, params=params, data=data)
    print(response.text)
    

    【讨论】:

    • 谢谢你,它工作得很好,你能告诉我如何显示请求吗,因为我不知道你怎么知道需要输入什么数据。与使用 selenium 导航相比,它看起来像是一个轻量级的解决方案。
    • 我使用开发者工具中的网络选项卡来查找包​​含所需数据的正确请求。
    猜你喜欢
    • 1970-01-01
    • 2018-01-04
    • 1970-01-01
    • 1970-01-01
    • 2013-12-19
    • 1970-01-01
    • 2013-12-30
    • 2011-05-09
    • 2023-01-16
    相关资源
    最近更新 更多