网页爬虫分页答案

【问题标题】：Web Scraper Pagination网页爬虫分页
【发布时间】：2020-05-20 14:22:11
【问题描述】：

我创建了一个网络爬虫，我试图在页面加载后获取加载到 div 中的动态数据。

这是我的代码和源网站网址https://www.medizinerkarriere.de/kliniken-sortiert-nach-name.html

async function pageFunction(context) {
    // jQuery is handy for finding DOM elements and extracting data from them.
    // To use it, make sure to enable the "Inject jQuery" option.
    const $ = context.jQuery;
    var result = [];
    $('#klinikListBox ul').each(function(){        
        var item = {           
            Name: $(this).find('li.klName').text().trim(),
            Ort: $(this).find('li.klOrt').text().trim(),
            Land: $(this).find('li.klLand').text().trim(),            
            Url:""
        };
        result.push(item);    
    });

    // To make this work, make sure the "Use request queue" option is enabled.
    await context.enqueueRequest({ url: 'https://www.medizinerkarriere.de/kliniken-sortiert-nach-name.html' });

    // Return an object with the data extracted from the page.
    // It will be stored to the resulting dataset.
    return result;
}

但是有点击分页，我不知道怎么做。

我尝试了此链接中的所有方法，但没有成功。

https://docs.apify.com/scraping/web-scraper#bonus-making-your-code-neater

请提供帮助，我们将不胜感激。

【问题讨论】：

标签： pagination apify

【解决方案1】：

在这种情况下，分页会在单个页面上动态加载，因此将新页面排入队列是没有意义的。只需单击页面按钮即可进入下一页，单击后稍等片刻也是一个好习惯。

$('#PGPAGES span').eq(1).click();
await context.waitFor(1000)

你可以通过一个简单的循环来抓取所有页面

const numberOfPages = 8 // You can scrape this number too
for (let i = 1; i <= numberOfPages; i++) {
    // Your scraping code, push data to an array and return them in the end
    $('#PGPAGES span').eq(i).click();
    await context.waitFor(1000)
}

【讨论】：