抓取时如何处理分页答案

【问题标题】：How to deal with Pagination when scraping抓取时如何处理分页
【发布时间】：2019-04-23 14:30:42
【问题描述】：

我出于教育目的而抓取的网站有分页。

我的代码可以很好地抓取第一页...

但是我怎么刮

?page=2
?page=3
?page=4
?page=5

还有吗？？...

应该指出，我已经寻找解决方案，但似乎找不到任何可以明确回答我需要知道的内容。

当前代码：

// @nuget: HtmlAgilityPack
using System;
using System.Data;
using System.Data.SqlClient;
using System.Net;
using HtmlAgilityPack;


public class Program

{

    public static void Main()
    {


        ServicePointManager.Expect100Continue = true;
        ServicePointManager.SecurityProtocol = SecurityProtocolType.Tls
               | SecurityProtocolType.Tls11
               | SecurityProtocolType.Tls12
               | SecurityProtocolType.Ssl3;
        HtmlWeb web = new HtmlWeb();
        HtmlDocument html = web.Load("https://www.g2crowd.com/products/google-analytics/reviews");
        //  var divNodes = html.DocumentNode.SelectNodes("//div[@class='mb-2 border-bottom']");

        var divNodes = html.DocumentNode.SelectNodes(@"//div[@itemprop='reviewBody']");

        if (divNodes != null)
        {
            foreach (var tag in divNodes)
            {
                string review = tag.InnerText;
                review = review.Replace("What do you like best?", "What do you like best?\n");
                review = review.Replace("What do you dislike?", "\nWhat do you dislike?\n");
                review = review.Replace("Recommendations to others considering the product", "\n\nRecommendations to others considering the product\n");
                review = review.Replace("What business problems are you solving with the product?  What benefits have you realized?", "\n\nWhat business problems are you solving with the product?  What benefits have you realized?\n");
                Console.WriteLine(review);
                Console.WriteLine("\n------------------------------- Review found. Adding to Database -------------------------------\n");
                review = review.Replace("'", "");
                review = review.Replace("\n", "<br />");
            }
        }
    }
}

【问题讨论】：

你本能地认为你会如何处理它？你可能已经有了答案……这里没有灵丹妙药，要么尝试下一页，要么搜索页面寻找线索，看看是否可以
我的猜测是跟随链接到下一页，或者在完成 page=1 后以某种方式编码 > 移动到 page=2？对 C# 来说很新——很难把我的想法变成代码。过去，SO 的轻推似乎帮助我学到了很多东西！有点难过！
取决于你是否在做一个爬虫，如果有的话，链接应该是可追踪的，如果你只是想再次获取集合，只需点击链接，而不是更多可以添加。也许其他人可以插话

标签： c# .net web-scraping pagination

【解决方案1】：

下一个链接如下所示：

//link[@rel=next]

继续关注它，直到它不再存在。

【讨论】：

next_page = response.xpath('//link[@rel="next"]/@href').extract_first(); if (next_page yield response.follow(next_page)); 这是我目前所拥有的。似乎还没有工作。
好吧，我不确定response 是什么，它应该是 html 解析器对象，而不是原始响应（如果有意义的话）。