【问题标题】:Unable to scrape desired DIV - BeautifulSoup无法抓取所需的 DIV - BeautifulSoup
【发布时间】:2015-03-02 06:58:24
【问题描述】:

我正在用BeautifulSoup 抓取this URL

我想抓取 Our Features 标题之后的每个 DIV:

if hotel_meetings_soup.select("div#contentArea div.highlightBox"):
    print(hotel_meetings_soup.select("div#contentArea")) # debug 1
    exit(0)
    for meeting in hotel_meetings_soup.select("div#contentArea div.highlightBox"):
        print("\n Feature start here\n")
        print(meeting)
        # Rest of code

所有的 DIV 都有相同的类 highlightBox 但我不知道为什么 debug 1 只打印具有

的 DIV 的标记
Number Of Guest Rooms:  500
Number Of Meeting Spaces:   29
Largest Meeting Space:  17,377 sq ft (1,614.28 sq.m)

在其中,但不在其他人中。

【问题讨论】:

    标签: python html python-3.x web-scraping beautifulsoup


    【解决方案1】:

    这个想法是,首先,通过文本找到Our Featuresh3元素,然后使用find_next_siblings()找到合适的下一个兄弟:

    import requests
    from bs4 import BeautifulSoup
    
    url = 'http://www.starwoodhotels.com/sheraton/property/meetings/index.html?language=en_US&propertyID=1391'
    response = requests.get(url, headers={
        'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36'
    })
    
    soup = BeautifulSoup(response.content)
    features = soup.find(text='Our Features')
    
    for div in features.parent.find_next_siblings('div', class_='highlightBox'):
        print(div.text.strip())
    

    打印:

    Weddings
    Host a beautiful wedding in the Valley of the Sun with our spectacular views, lush ceremony lawns, and upscale ballrooms with pre-function space. Stellar catering and superb service ensure a amazing day. More >
    ...
    Get Rewarded
    Earn Starpoints® and eligible nights toward SPG elite status on your next meeting or event. More >
    

    【讨论】:

    • 我会试一试......但我可以为许多其他酒店抓取这个 DIV,这并不奇怪。 starwoodhotels.com/sheraton/property/meetings/…我可以使用我在问题中发布的代码获取所有必需的 DIV,但对于这家酒店来说,即使这些 DIV 存在于 contentArea
    猜你喜欢
    • 1970-01-01
    • 2020-11-10
    • 2023-03-24
    • 1970-01-01
    • 1970-01-01
    • 2015-01-05
    • 1970-01-01
    • 1970-01-01
    • 2019-04-10
    相关资源
    最近更新 更多