Scrapy 和 Selenium - 加载下一页答案

【问题标题】：Scrapy & Selenium - Load next pagesScrapy 和 Selenium - 加载下一页
【发布时间】：2019-09-21 20:19:58
【问题描述】：

我正在尝试抓取以下网站：https://sabobic.blogabet.com

我的爬虫已经爬取了我需要的内容。但是在单击“查看旧”按钮后，我不知道如何再次启动我的方法“crawltips”。

这是我当前的代码：

class AlltipsSpider(Spider):
    name = 'alltips'
    allowed_domains = ['blogabet.com']

    def start_requests(self):
        self.driver = webdriver.Chrome('C:\webdrivers\chromedriver.exe')
        # Place all user urls here
        url = "https://sabobic.blogabet.com"
        self.driver.get(url)
        yield scrapy.http.Request (url, callback=self.crawltips)


    def crawltips(self, response):
        sel = Selector(text=self.driver.page_source)
        allposts = sel.xpath('//*[@class="block media _feedPick feed-pick"]')

        for post in allposts:
            username = post.xpath('.//div[@class="col-sm-7 col-lg-6 no-padding"]/a/@title').extract()
            publish_date = post.xpath('.//*[@class="bet-age text-muted"]/text()').extract()

            yield{'Username': username,
                'Publish date': publish_date
                }

        try:
            self.driver.find_element_by_id('last_item').click()
            sleep(5)
        except NoSuchElementException:
            self.logger.info('No more tipps')
        yield Request(url, callback=self.crawltips)

我认为 yield 函数有问题，因为我没有新的 url...

【问题讨论】：

标签： selenium web-scraping scrapy

【解决方案1】：

以下应该有效：

yield scrapy.Request(self.driver.current_url,callback=self.crawltips)

【讨论】：