【发布时间】:2019-09-21 20:19:58
【问题描述】:
我正在尝试抓取以下网站:https://sabobic.blogabet.com
我的爬虫已经爬取了我需要的内容。但是在单击“查看旧”按钮后,我不知道如何再次启动我的方法“crawltips”。
这是我当前的代码:
class AlltipsSpider(Spider):
name = 'alltips'
allowed_domains = ['blogabet.com']
def start_requests(self):
self.driver = webdriver.Chrome('C:\webdrivers\chromedriver.exe')
# Place all user urls here
url = "https://sabobic.blogabet.com"
self.driver.get(url)
yield scrapy.http.Request (url, callback=self.crawltips)
def crawltips(self, response):
sel = Selector(text=self.driver.page_source)
allposts = sel.xpath('//*[@class="block media _feedPick feed-pick"]')
for post in allposts:
username = post.xpath('.//div[@class="col-sm-7 col-lg-6 no-padding"]/a/@title').extract()
publish_date = post.xpath('.//*[@class="bet-age text-muted"]/text()').extract()
yield{'Username': username,
'Publish date': publish_date
}
try:
self.driver.find_element_by_id('last_item').click()
sleep(5)
except NoSuchElementException:
self.logger.info('No more tipps')
yield Request(url, callback=self.crawltips)
我认为 yield 函数有问题,因为我没有新的 url...
【问题讨论】:
标签: selenium web-scraping scrapy