(Scrapy) 无法从 html 中提取 HREF答案

【问题标题】：(Scrapy) Unable to extract HREF from html(Scrapy) 无法从 html 中提取 HREF
【发布时间】：2017-09-30 17:39:42
【问题描述】：

next_page = response.css('body > div.bg-theme-b.page-results.page-listing-results.page-results-responsive.ui-page.ui-page-theme-a.ui-page-header-fixed.ui-page-active > div.ui-panel-wrapper > div > div:nth-child(10) > div > ul > li.ui-block-b > a').extract_first()

    if next_page is not None:
        next_page = response.urljoin(next_page)
        yield scrapy.Request(next_page, callback=self.parse)

我正在尝试从 m.zoopla.co.uk（移动设备，因为从...中抓取数据似乎更好）中抓取房价数据。我无法获取下一页链接上的 href 属性。数据是从第一页很好地抓取的——那里没有问题。

我使用了最长、最直接的 CSS 选择器，如此处所示 - 它不起作用。我尝试过使用较短的选择器：response.css('li.ui-block-b > a::attr(href)').extract_first()，但它仍然返回为 None。

我也尝试过使用 XPath。那里也没有雪茄。请帮忙。某人。任何人..

附言。只包含下一页的代码，因为没有错误，实际数据抓取工作正常。

【问题讨论】：

请提供您正在抓取的确切url

标签： python xpath css-selectors scrapy

【解决方案1】：

您可以简单地选择具有data-icon="carat-r" 属性的<a> 元素，然后获取href，因为只有下一页链接具有data-icon="carat-r" 属性。

next_page = response.css('a[data-icon*=carat-r]::attr(href)').extract_first()

输出：

2017-05-03 13:27:53 [quotes] DEBUG: next_page is /house-prices/browse/coalville/?pn=2

【讨论】：