无法从 span scrapy python 获取文本答案

【问题标题】：Cant get text from span scrapy python无法从 span scrapy python 获取文本
【发布时间】：2020-02-05 22:37:27
【问题描述】：

所以我正在制作一个机器人来从 Zara 产品中获取价格和名称，我设法获取了产品名称，但它返回的价格为 []。

这是我的代码：

#!/usr/bin/python3
#-*- coding: utf-8 -*-

import scrapy

class Zara(scrapy.Spider):
    name = 'Zara'

def start_requests(self, url='https://www.zara.com/pt/pt/casaco-l%C3%A3-quadrados-p02092540.html?v1=42984974&v2=1445646'):
    yield scrapy.Request(url=url, callback=self.parse)

def parse(self, response):
    try:
        name = response.xpath('//*[@id="product"]/div[1]/div/div[2]/header/h1/text()').get()
        price = response.xpath('//*[@id="product"]/div[1]/div/div[2]/div[1]/span/text()').get()
    except:
        print('Fail')

    print(name)
    print(price)

它返回什么：

CASACO LÃ QUADRADOS
[]

它应该返回什么：

CASACO LÃ QUADRADOS
149,00 EUR

我尝试过的一切：

price = response.xpath('//*[@id="product"]/div[1]/div/div[2]/div[1]/span').get()
price = response.xpath('//*[@id="product"]/div[1]/div/div[2]/div[1]/span/text()').get()
price = response.xpath('//*[@id="product"]/div[1]/div/div[2]/div[1]/span[@class="main-price"]').get()
price = response.xpath('//*[@id="product"]/div[1]/div/div[2]/div[1]/span[@class="main-price"]/text()').get()

我想这就是我所尝试的！我正在使用带有 python 3.7 的 scrapy 1.8 版

【问题讨论】：

标签： python-3.x web-scraping scrapy

【解决方案1】：

您无法使用普通的“xpath/css”方法获取价格的原因是，您的爬虫无法直接使用“价格”字段。您的爬虫看到的页面不同，因此 xpath(s) 完全不同。

试试这个方法：

from re import search

_script = response.xpath("//script[contains(text(),'price')][1]")[0].extract()
price = search ( r",.price.:(\d+)", _script ).group(1)

此外，最好使用不同的尝试...除了个别字段，以便您知道究竟是哪个部分产生了错误，以便进一步纠正。

【讨论】：

我现在正在学习scrapy，当我使用selenium 来报废时，XPath 适用于一切，所以我不明白，但感谢您的帮助！
它正在输出这个AttributeError: 'NoneType' object has no attribute 'group'的价格
@DeadSec，尝试设置标题 - 我相信，您的爬虫没有查看确切的页面，它可能正在查看验证码页面。