【发布时间】:2025-12-18 17:50:01
【问题描述】:
我正在尝试从网站获取一些数据,但即使在正确的分页链接之后,我的蜘蛛也没有爬到下一页。
import scrapy
class NspiderSpider(scrapy.Spider):
name = "nspider"
allowed_domains = ["elimelechlab.yale.edu/"]
start_urls = ["https://elimelechlab.yale.edu/pub"]
def parse(self, response):
title = response.xpath(
'//*[@class="views-field views-field-title"]/span/text()'
).extract()
doi_link = response.xpath(
'//*[@class="views-field views-field-field-doi-link"]//a[1]/@href'
).extract()
yield {"paper_title": title, "doi_link": doi_link}
next_page = response.xpath(
'//*[@title="Go to next page"]/@href'
).extract_first() # extracting next page link
if next_page:
yield scrapy.Request(url=response.urljoin(next_page), callback=self.parse)
PS:我不想使用 LinkExtractor。 任何帮助将不胜感激。
【问题讨论】:
标签: python web-scraping scrapy