【问题标题】:Can't extract HTML source of a webpage (BeautifulSoup)无法提取网页的 HTML 源代码 (BeautifulSoup)
【发布时间】:2021-03-20 11:56:35
【问题描述】:

页面源码:view-source:https://www.myhome.ie/residential/dublin/property-for-sale

import requests, lxml
from bs4 import BeautifulSoup

url = "https://www.myhome.ie/residential/dublin/property-for-sale"
html = requests.get(url)

soup = BeautifulSoup(html.content, 'html.parser')

print(soup)

# paging = soup.find_all("div",{"class":"PropertyInfoStrip ng-star-inserted"})

我需要获取源代码的 html,以便我可以抓取 div 类,但使用 bs4 它只显示 JS 脚本,我无法提取任何 HTML?我究竟做错了什么。通过浏览器检查元素时,我可以看到 HTML

【问题讨论】:

    标签: python-3.x web-scraping beautifulsoup


    【解决方案1】:

    数据是通过 JavaScript 动态加载的,所以beautifulsoup 看不到它。您可以通过 Ajax 请求获取 Json 格式的数据:

    import json
    import requests
    
    
    params = {
        "ApiKey": "5f4bc74f-8d9a-41cb-ab85-a1b7cfc86622",
        "CorrelationId": "e4e14c46-53e6-463f-9bdc-f67785bd4915",
        "SessionId": None,
        "RequestTypeId": 2,
        "RequestVerb": "POST",
        "Endpoint": "https://api.myhome.ie/search",
        "Page": 1,
        "PageSize": 20,
        "SortColumn": 2,
        "SortDirection": 2,
        "SearchRequest": {
            "IsBackendSearch": False,
            "SkipSearchIndex": False,
            "IsGroupPrivateSearch": False,
            "IsSaleAgreed": False,
            "IsSold": False,
            "IsAuction": False,
            "IsBoundsSearch": False,
            "UseFreeTextSearchForKeywords": False,
            "SearchContent": False,
            "PropertyIds": [],
            "GroupIds": [],
            "ChannelIds": [1],
            "PropertyTypeIds": [],
            "PropertyClassIds": [1],
            "PropertyStatusIds": [2, 12],
            "SaleTypeIds": [],
            "FeatureTypeIds": [],
            "RegionId": 1265,
            "LocalityIds": [],
            "LocalityNames": [],
            "NegotiatorIds": [],
            "SolicitorIds": [],
            "BuyerSolicitorIds": [],
            "VendorSolicitorIds": [],
            "TransferedByUserIds": [],
            "RowStatusIds": [2],
            "EnergyRatings": [],
            "Polygons": [],
            "Destinations": [],
            "Tags": [],
            "PrivateTags": [],
            "PreSixtyThree": False,
            "IsActive": True,
            "HasPhotos": False,
            "PriceFrequency": "Monthly",
        },
    }
    
    url = "https://api.myhome.ie/search"
    
    params["Page"] = 1  # <--- change to desired page
    data = requests.post(url, json=params).json()
    
    # uncomment to print all data:
    # print(json.dumps(data, indent=4))
    
    # print some results to screen
    for result in data["SearchResults"]:
        print("{:<30} {}".format(result["PropertyType"], result["DisplayAddress"]))
    

    打印:

    Semi-Detached House            248 Swords Road, Whitehall, Dublin 9, D09 K8W7
    Apartment                      24 Mountfield Park, Malahide,   County Dublin
    Semi-Detached House            26 Griffeen Glen Boulevard, Lucan, Co. Dublin
    Semi-Detached House            4 Bedroom Home at Skylark, St. Marnock's Bay, Portmarnock, Dublin
    Terraced House                 250 Laraghcon, Lucan, Co. Dublin
    Bungalow                       1 Castleland Park View, Balbriggan, County Dublin
    Semi-Detached House            657 Whitechurch Road, Taylors Lane, Rathfarnham, Dublin 14
    Terraced House                 22 Reuben Avenue, Rialto, Dublin 8
    Semi-Detached House            Merrion Lodge, 135 Mount Merrion Avenue, Blackrock, Co. Dublin
    Terraced House                 74 Seapark Drive, Clontarf, Dublin 3
    Terraced House                 5 O'Daly Road, Drumcondra, Dublin 9
    Detached House                 Churchtown House, Weston Park, Dublin 14, Dublin
    Detached House                 St. Kevins, 17 Rathfarnham Park, D14, Dublin 14, Dublin
    Terraced House                 The Terrace, Foxrock, Dublin 18
    Terraced House                 7 Whately Place, Kilmacud Road Upper, Stillorgan, Co. Dublin
    Detached House                 The Cottage, Dublin Road, Oldtown, County Dublin
    Semi-Detached House            31 Gleann Na Smol, Oldbawn, Dublin 24
    Terraced House                 218 Castlecurragh Heath, Mulhuddart, Dublin 15
    Semi-Detached House            19 Woodside, Dodder Park Road, Rathfarnham, Dublin 14
    Apartment                      Apartment, 46 Slade Castle Court, Saggart, Co. Dublin
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2021-07-08
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多