【问题标题】:Trouble targeting elements on website (selenium webdriver)在网站上定位元素时遇到问题(selenium webdriver)
【发布时间】:2015-03-27 20:03:48
【问题描述】:

我正在尝试定位房地产网站上的属性。理想情况下,我想提取每个列表的房地产营销 URL、标题、位置和电子邮件。属性全部列出如下:

<div class="propertyList">
        <div id="propertyList74495-sale" class="deal_on_market propertyListItem" data-property-id="74495-sale" data-listing-url="http://svncommercialadvisors.com/properties/?propertyId=74495-sale" data-listing-id="148815"></div>

           <table>
             <tbody>
                 <tr>
                   <td class="thumbnail">
                <a target="_top" href="http://svncommercialadvisors.com/properties/?propertyId=74495-sale"></a>
            </td>
            <td class="addressInfo">
                <a target="_top" href="http://svncommercialadvisors.com/properties/?propertyId=74495-sale">

                    Engelberg Antik's

                </a>
                <p class="propertiesListCityStateZip">
                    <img src="/images/map-marker-tiny.png?1427481879" alt="Map-marker-tiny"></img>


                    Salem, OR

                </p>
                <p class="description">

                    Outstanding downtown Salem opportunity, right next…

                </p>
                <div class="smallAttributes">
                    <div></div>
                    <div></div>
                    <div></div>
                </div>
            </td>
            <td class="propertyInfo">
                <div>

                    $479,900

                </div>
                <div>

                    13,612 SF

                </div>
                <div>

                    Street Retail

                </div>
            </td>
        </tr>
    </tbody>
</table>
<div class="contactAdvisor">
    ::before
    <a href="mailto:brokeremail@svn.com"></a>


    or call
    503.588.0400
    for more information

</div>
<div class="links"></div>

        <div id="propertyList61436-sale" class="deal_under_contract propertyListItem" data-property-id="61436-sale" data-listing-url="http://svncommercialadvisors.com/properties/?propertyId=61436-sale" data-listing-id="124490"></div>

        <div id="propertyList89374-sale" class="deal_on_market propertyListItem" data-property-id="89374-sale" data-listing-url="http://svncommercialadvisors.com/properties/?propertyId=89374-sale" data-listing-id="173124"></div>

        <div id="propertyList84437-sale" class="deal_on_market propertyListItem" data-property-id="84437-sale" data-listing-url="http://svncommercialadvisors.com/properties/?propertyId=84437-sale" data-listing-id="164488"></div>

        <div id="propertyList84478-sale" class="deal_on_market propertyListItem" data-property-id="84478-sale" data-listing-url="http://svncommercialadvisors.com/properties/?propertyId=84478-sale" data-listing-id="164538"></div>

         ...

这是我第一次尝试:

from selenium import webdriver
import sys
import smtplib
import pymongo

newProperties = []

driver = webdriver.Firefox()
driver.get('http://svncommercialadvisors.com/properties/')

for property in driver.find_elements_by_class_name('propertyList'):
    #get title,location 
    info = property.find_elements_by_class_name('addressInfo')
    email = property.find_elements_by_partial_link_text('.com')

当我运行上述程序时,它不会给出驱动程序无法定位元素的任何错误。但是,当我打印出元素时,什么都没有出现。如何更好地定位元素?我想要这样的东西,附加到一个列表中:

-title: Engelberg Antik's
-location: Salem, OR
-url: http://svncommercialadvisors.com/properties/?propertyId=74495-sale
-email: brokeremail@svn.com

【问题讨论】:

    标签: python selenium selenium-webdriver web-scraping


    【解决方案1】:

    这里的关键问题是搜索结果加载在iframe中。

    您需要在搜索属性之前切换到iframe

    from selenium import webdriver
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.support import expected_conditions as EC
    
    driver = webdriver.Firefox()
    driver.get('http://svncommercialadvisors.com/properties/')
    
    # wait for frame to appear and switch
    frame = WebDriverWait(driver, 10).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "div#buildout iframe")))
    driver.switch_to.frame(frame)
    
    for property in driver.find_elements_by_class_name('propertyList'):
        info = property.find_element_by_class_name('addressInfo')
        email = property.find_element_by_partial_link_text('Email')
    
        print info.text
        print print email.get_attribute('href')
    

    我还应用了两个修复:

    • find_elements_by_class_namme 替换为find_elements_by_class_name
    • property.find_elements_by_partial_link_text('.com') 替换为property.find_element_by_partial_link_text('Email')

    打印出来:

    Engelberg Antik's
    Salem, OR
    Outstanding downtown Salem opportunity, right next door to the newly renovated Roth and McGilchri...
    mailto:jennifer.martin@svn.com
    

    【讨论】:

    • 谢谢!如何让它为页面上的所有属性打印?我猜它会为 for 中的每个属性打印它
    • @RobertOttolia 当然,再做一个改动:driver.find_elements_by_class_name('propertyListItem') 而不是 driver.find_elements_by_class_name('propertyList')
    猜你喜欢
    • 1970-01-01
    • 2023-02-24
    • 2021-04-11
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2023-03-15
    • 1970-01-01
    • 2019-07-17
    相关资源
    最近更新 更多