【问题标题】:Python web scraper will not work on deeply nested tagsPython web scraper 不适用于深度嵌套的标签
【发布时间】:2021-12-09 12:27:34
【问题描述】:

这是我使用 Python 编码的第二周。我想写一个爬虫,它会返回所有位置的位置及其电话号码。刮板不完整,我尝试了几个版本,但它们都返回空列表 [] 或错误。

import requests
from bs4 import BeautifulSoup
import requests

webpage_response = requests.get('https://www.orangetheory.com/en-us/locations/')

webpage = webpage_response.content

soup = BeautifulSoup(webpage, "html.parser")

soup.find_all(attrs={'class': 'aria-label'})

【问题讨论】:

    标签: python web-scraping beautifulsoup screen-scraping re


    【解决方案1】:

    要从该页面获取有关美国所有健身俱乐部的信息,您可以使用下一个示例:

    import requests
    import pandas as pd
    
    url = "https://api.orangetheory.co/partners/v2/studios?country=United%20States&sort=studioName"
    
    data = requests.get(url).json()
    df = pd.DataFrame([d[0] for d in data["data"]])
    df = pd.concat([df, df.pop("studioLocation").apply(pd.Series)], axis=1)
    
    print(df)
    df.to_csv("data.csv", index=False)
    

    打印:

          studioId                            studioUUId  mboStudioId                                  studioName studioNumber                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              description        studioStatus             openDate           reOpenDate   taxRate                                                                                                                                logoUrl                                   contactEmail              timeZone environment                                   studioProfiles                                       physicalAddress               physicalCity         physicalState physicalPostalCode                     physicalRegion  physicalCountryId physicalCountry     phoneNumber      latitude      longitude
    0         2266  f627d35c-9e2b-452a-8017-bfbcccff5a4d     610952.0                             14th Street, DC         0943  The science of excess post-exercise oxygen consumption(EPOC) takes your results to new heights in this exciting group fitness concept. You will feel new energy and see amazing results with only 2-4 workouts per week. Each 60-minute class is broken into intervals of high-energy cardiovascular training and strength training. Use a variety of equipment including treadmills, rowing machines, suspension training, and free weights to tone your body and gain energy throughout the day. Exciting and inspiring group classes motivate you to beat plateaus and stick to your goals. Pay-as-you-go or get deep discounts with customized packages.\r\n\r\nThe best part of the Orange Experience is the results. You can burn calories for up to 38 hours after your workout!              Active  2017-09-02 00:00:00  2020-06-27 00:00:00  0.000000                 https://clients.mindbodyonline.com/studios/OrangetheoryFitnessWashingtonDC0943/logo_mobile.png?imageversion=1513008044      studiomanager0943@orangetheoryfitness.com      America/New_York        PROD     {'isWeb': 1, 'introCapacity': 1, 'isCrm': 1}                           1925 14th Street NW Suite C                 Washington  District of Columbia              20009                              DC-01                  1   United States      2028691700   38.91647339   -77.03197479
    1        47914  01ddd24d-58bf-4959-bcb2-34587d6e48fc     660917.0                         2021 Virtual Summit        10001                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     None         Coming Soon                 None                 None  0.000000                                                                                                                                   None     studiomanager10001@orangetheoryfitness.com      America/New_York        PROD     {'isWeb': 0, 'introCapacity': 0, 'isCrm': 0}                                                     *                          *                     *                  *                              NV-01                  1   United States                  -81.66339500   -15.58054000
    2         2964  9fd74853-4bad-4f1d-a9c7-fcbf27eb1651     576312.0                                 Abilene, TX         0862                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     None              Active  2018-09-22 00:00:00  2020-05-18 00:00:00  0.000000                                                                                                                                   None      studiomanager0862@orangetheoryfitness.com       America/Chicago        PROD     {'isWeb': 1, 'introCapacity': 1, 'isCrm': 1}                                    3950 Catclaw Drive                    Abilene                 Texas              79606                              TX-06                  1   United States      3254006191   32.40399933   -99.77462006
    3         3139  2a5a5bc7-ea4a-4a2a-b166-56ce5e6ee7e2     415638.0                                     Acworth         1188                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     None              Active  2019-04-17 00:00:00  2020-05-17 00:00:00  0.000000                                                                                                                                   None      studiomanager1188@orangetheoryfitness.com      America/New_York        PROD     {'isWeb': 1, 'introCapacity': 1, 'isCrm': 1}                   4391 Acworth Dallas Rd NW Suite 212                    Acworth               Georgia              30101                              GA-01                  1   United States      7706748722   34.05842590   -84.72319794
    
    ...
    

    并保存data.csv(来自 LibreOffice 的屏幕截图):

    【讨论】:

    • 啊。惊人的。非常感谢!
    最近更新 更多