【问题标题】:How to scrape and infinity scrolling page?如何抓取和无限滚动页面?
【发布时间】:2026-01-01 16:35:02
【问题描述】:

我试图在 next.co.uk 中抓取男士外套和夹克类别,我意识到该页面有无限滚动页面

# -*- coding: utf-8 -*-
import scrapy
from ..items import NextItem

class NewoneSpider(scrapy.Spider):
    name = 'newOne'
    allowed_domains = ['www.next.co.uk']
    start_urls = [
        'https://www.next.co.uk/shop/gender-newbornboys-gender-newbornunisex-gender-olderboys-gender-youngerboys-productaffiliation-coatsandjackets-0'
        ]

    def parse(self, response):
        items = NextItem();
        global productCategory
        global productSubCategory
        products = response.css('.Details')
        currentUrl = response.request.url

        for product in products:
            productCategory = 'Furniture'
            productSubCategory = 'living Room'
            productCountry = 'uk'
            productSeller = 'John Lewis'
            productLink = product.css('.TitleText::attr(href)').extract_first()
            productTitle = product.css('.Desc::text').extract_first()
            productImage = product.css('.Image img::attr(src)').extract_first()
            productSalePrice = product.css('.Price a::text').extract_first()

            items['productCategory'] = productCategory
            items['productSubCategory'] = productSubCategory
            items['productCountry'] = productCountry
            items['productSeller'] = productSeller
            items['productLink'] = productLink
            items['productTitle'] = productTitle
            items['productImage'] = productImage
            items['productSalePrice'] = productSalePrice

            yield items

我能够抓取 28 个项目,并且我可以在具有无限滚动实现的网站上看到更多。

【问题讨论】:

    标签: python web-scraping


    【解决方案1】:

    当您向下滚动页面时,会向服务器发送 XHR 调用并请求更多数据。 示例:

    https://www.next.co.uk/shop/gender-newbornboys-gender-newbornunisex-gender-olderboys-gender-youngerboys-productaffiliation-coatsandjackets/isort-score-minprice-0-maxprice-30000-srt-24
    

    每个请求几乎相同,但 url 中的最后一个元素增长了 24:

    • srt-24
    • srt-48
    • srt-72

    现在您已经知道“无限”是如何工作的,您可以尝试使用代码对其进行模拟。

    例子:

    import requests
    
    URL_TEMPLATE = 'https://www.next.co.uk/shop/gender-newbornboys-gender-newbornunisex-gender-olderboys-gender-youngerboys-productaffiliation-coatsandjackets/isort-score-minprice-0-maxprice-30000-srt-{}'
    
    for step in range(24, 240, 24):
        r = requests.get(URL_TEMPLATE.format(step))
        if r.status_code == 200:
            # TODO We have the data - lets parse it
            pass
    

    【讨论】:

    • 仍然刮取正常数量的物品
    • @fafoworatobi 我不确定我是否理解您的评论。
    • 从第二页开始,只抓取第二页
    • @fafoworatobi 你看我的示例代码了吗?你能分享你更新的代码吗?
    • 它正在工作,我现在看到了你的示例,我已经能够从该类别中抓取 216 个项目
    【解决方案2】: