【问题标题】:Web Scraping: Next page is rendered in Javascript, how can I get it using ScrapyWeb Scraping:下一页以 Javascript 呈现,我如何使用 Scrapy 获取它
【发布时间】:2021-03-11 22:35:35
【问题描述】:

我一直在尝试使用 Scrapy https://www.remax.com/homes-for-sale/ny/new-york/city/3651000 抓取该网站。我能够获取页面上的内容,但我无法转到下一页,因为它似乎是用 javascript 呈现的。怎么会这样?

【问题讨论】:

    标签: javascript python web-scraping scrapy screen-scraping


    【解决方案1】:

    我不确定如何使用 scrapy 实现您正在做的事情,但看起来 javascript 正在从后端 API 中提取所有结果。您可以通过浏览器开发工具找到后端 URL 和 AJAX 请求的详细信息。它看起来像下面的代码,试一试。您或许可以直接从他们的 API 中提取您正在寻找的信息。

    import requests
    
    payload = {
        "count":24,
        "offset":0,
        "sorts":{
            "0":{
                "listingContractDate":"desc"
            }
        },
        "terms":{
            "place":{
                "lat":40.70668199998021,
                "lon":-73.97795499996471,
                "city":"New York",
                "state":"NY",
                "placename":"New York, NY",
                "placeType":"city",
                "placeId":"3651000",
                "areaSquareMiles":308.12
            },"locationRect":{
                "minLat":40.3400891972592,
                "maxLat":41.07193158068027,
                "minLon":-74.24986662109377,
                "maxLon":-73.70604337890627
            },
            "bPropertyType":[
                "Single Family",
                "Condo/Townhome",
                "Mobile Home",
                "Multi-Family",
                "Rental","Farm",
                "Land"],
            "bStatus":[
                "For Sale",
                "Under Contract"
            ],
            "city":[
                "New York"
            ],
            "State":[
                "NY"
            ]
        },
        "listingLoadLevel":"Search"
    }
    
    
    r = requests.post("https://public-api-gateway-prod.kube.remax.booj.io/listings/search/run/", json=payload)
    print(r.json())
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 2021-12-31
      • 1970-01-01
      • 1970-01-01
      • 2019-05-03
      • 1970-01-01
      • 2019-03-03
      • 2018-02-13
      相关资源
      最近更新 更多