【发布时间】:2021-04-17 10:15:19
【问题描述】:
Scrapy 不跟随抓取的链接。当我调用应该从之前抓取的链接中提取详细信息的函数时,它不起作用,而是从 start_urls 中抓取内容。
import scrapy
class OlxSpider(scrapy.Spider):
name = 'olx'
start_urls = ['https://www.olx.pl/nieruchomosci/mieszkania/wynajem/wroclaw/']
allowed_domains = ['www.olx.pl']
def parse(self, response):
urls = response.css('a.detailsLink::attr(href)').extract()
for url in urls:
url = response.urljoin(url)
yield response.follow(url, callback=self.parse_indetail)
def parse_indetail(self, response):
item = {}
item['price'] = response.xpath('//h3/text()').extract()
yield item
【问题讨论】:
-
您好,我对您的问题有疑问,使用“response.urljoin(url)”是否正确,响应是您从 start_urls 收到的,它不是 url 链接。跨度>
标签: python web-scraping scrapy data-extraction