网页抓取以获取更多页面答案

【问题标题】：web scraping for more pages网页抓取以获取更多页面
【发布时间】：2020-07-06 08:41:14
【问题描述】：

目前，我正在为一个网站进行网络抓取，当页面自动加载时，我需要在该网站上获取数据。我正在使用 BeautifullSoup 和请求。

import requests
from bs4 import BeautifulSoup

page = requests.get("https://www.monki.com/en/newin/view-all-new.html")
soup = BeautifulSoup(page.content, 'html.parser')
article_codes=[] 
for k in soup.findAll('div',attrs={"class":"producttile-details"}):
    article_code = k.find('span', attrs={'class':"articleCode"})
    print(article_code)

    article_codes.append(article_code.text)

使用此代码，我只获取页面的数据，但我想要页面加载后的所有数据。

【问题讨论】：

标签： python python-3.x web-scraping beautifulsoup python-requests

【解决方案1】：

该页面正在使用 JavaScript 加载其他页面。您可以使用requests 模块来模拟这些请求。

例如：

import requests
from bs4 import BeautifulSoup

url = 'https://www.monki.com/en_eur/newin/view-all-new/_jcr_content/productlisting.products.html'
headers = {
    'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:78.0) Gecko/20100101 Firefox/78.0',
}

with requests.session() as s:
    s.get('https://www.monki.com/en_eur/newin/view-all-new.html', headers=headers).text

    for page in range(0, 10):  # <-- adjust to required number of pages
        soup = BeautifulSoup(s.get(url, params={'offset': page*28}, headers=headers).content, 'html.parser')

        for product in soup.select('.o-product'):
            name = product.select_one('.product-name').get_text(strip=True)
            price = product.select_one('.price-tag').get_text(strip=True)
            link = product.select_one('.a-link')['href']

            print('{:<50} {:<10} {}'.format(name, price , link))

打印所有产品：

NEW! Maxi smock dress                              €30        https://www.monki.com/en_eur/clothing/dresses/midi-dresses/product.midi-button-up-shirt-dress-black.0871799004.html
NEW! Retro skater dress                            €20        https://www.monki.com/en_eur/clothing/dresses/mini-dresses/product.retro-skater-dress-white.0688447029.html
NEW! Mozik block jeans                             €40        https://www.monki.com/en_eur/clothing/jeans/product.mozik-block-jeans-blue.0874088001.html
NEW! Pack of two scrunchies                        €6         https://www.monki.com/en_eur/accessories/hair-accessories/product.pack-of-two-scrunchies-beige.0530296078.html
NEW! Mini hand bag                                 €18        https://www.monki.com/en_eur/accessories/bags,-wallets-belts/bags/product.mini-hand-bag-black.0826291006.html
NEW! Fitted crop top                               €10        https://www.monki.com/en_eur/clothing/tops/t-shirts/product.fitted-crop-top-purple.0906440002.html
NEW! Tiered smock dress                            €30        https://www.monki.com/en_eur/clothing/dresses/midi-dresses/product.tiered-smock-dress-blue.0895277004.html
NEW! Mini hand bag                                 €18        https://www.monki.com/en_eur/accessories/bags,-wallets-belts/bags/product.mini-hand-bag-beige.0826291008.html
NEW! Fitted t-shirt                                €10        https://www.monki.com/en_eur/clothing/tops/t-shirts/product.fitted-t-shirt-purple.0905746002.html
NEW! Shoulder pads t-shirt dress                   €25        https://www.monki.com/en_eur/clothing/dresses/mini-dresses/product.shoulder-pads-t-shirt-dress-beige.0929301002.html
NEW! Yoko mid blue jeans                           €40        https://www.monki.com/en_eur/clothing/jeans/product.yoko-mid-blue-jeans-blue.0656425001.html
NEW! Yoko classic blue jeans                       €40        https://www.monki.com/en_eur/clothing/jeans/product.yoko-classic-blue-jeans-blue.0807218001.html
NEW! Pleated midi skirt                            €25        https://www.monki.com/en_eur/clothing/skirts/midi-skirts/product.pleated-midi-skirt-black.0562278003.html

... and so on.

【讨论】：

但这仅适用于 10 页。我想要它的最后一页。
@WweCena 然后将数字 10 增加到更大的数字。