在 SCRAPY 中形成 XPATH 选择器答案

【问题标题】：Forming XPATH Selector in SCRAPY在 SCRAPY 中形成 XPATH 选择器
【发布时间】：2020-01-06 06:15:06
【问题描述】：

尝试从页面中提取产品名称：

https://www.v12outdoor.com/view-by-category/rock-climbing-gear/rock-climbing-shoes/mens.html

找不到返回有用的特定结果的 XPATH。

很抱歉我的第一篇文章是一个初学者的问题:(

class V12Spider(scrapy.Spider):
name = 'v12'
start_urls = ['https://www.v12outdoor.com/view-by-category/rock-climbing-gear/rock-climbing-shoes/mens.html']


def parse(self, response):
    yield {
        'price' : response.xpath('//span[@id="product-price-26901"]/text()'),
        'name' : response.xpath('//h3[@class="product-name"]/a/text()'),
           }

对于name，我希望使用类product-name 的h3 标记中的项目生成名称，但生成多行数据='\r\n

（虽然我们在为price 使用它，但有没有办法只提取数值？）

【问题讨论】：

标签： python css scrapy

【解决方案1】：

您所面临的问题可以使用 xpath 的 get() 方法，然后使用 string 的 strip() 方法来解决。我试过这样的事情

name= response.xpath('//h3[@class="product-name"]/a/text()').get()

给予

'\r\n                                RED CHILLI VOLTAGE                            '

然后使用

name.strip()

给予

'RED CHILLI VOLTAGE'

所以你可以用

替换你的名字声明

name= response.xpath('//h3[@class="product-name"]/a/text()').get().strip()

获取价格的相同解决方案只是在语句末尾添加 .get().strip

希望这会有所帮助。另请阅读https://docs.scrapy.org/en/latest/topics/selectors.html中的 .getall() 方法

【讨论】：

非常感谢 - 尽管知道这很简单，但我一直困惑不已！
xpath/css 选择器用于获取元素，我们调用 .get() 或 .getall() 或 extract_first() 来获取内部数据（阅读文档中的差异）。这只是许多人忘记做的常见错误；）
进入下一个愚蠢的问题... def parse(self, response): for shoe in response.css('.item'): yield { 'name' : shoe.xpath('/ /h3[@class="product-name"]/a/text()').().strip(), 'price' : shoe.xpath('//p[@class="special-price"] /span[@id="product-price-26901"]/text()').get().strip(), } 我期待它返回页面上 12 个项目的数据，而不是获得 12 x 1st条目...我对重申遗漏了什么？！再次为自己如此垃圾而错过显而易见的事情感到抱歉！
或者我可以使用 strip() 和 getall() 吗？似乎都出了问题！
我认为这是因为您将 css 和 xpath 选择器组合在一起，正如stackoverflow.com/questions/9005170/css-selector-inside-xpath 在这里回答的那样，尽管 我可能错了，因为我不精通使用 xpaths