Scrapy - 列表返回无 - 索引超出范围答案

【问题标题】：Scrapy - list returns None - index out of rangeScrapy - 列表返回无 - 索引超出范围
【发布时间】：2019-02-25 15:31:20
【问题描述】：

我的列表中有两个项目存在或不存在。如何编写检查列表？

物品看起来像这样

    item['BusinessType'] = response.xpath('//div//following-sibling::p//text()').extract()[3]
    item['BusinessArea'] = response.xpath('//div//following-sibling::p//text()').extract()[4]

有时列表成员 [3] 或 [4] 不存在，因此 Scrapy 失败

IndexError: list index out of range

我尝试了几种不同的方法，但都失败了。我不懂为什么。将 response.xpath 指定为局部变量并检查

        if biz_type:
            item['BusinessType'] = biz_type
        else:
            biz_type_none = "None"
            item['BusinessType'] = biz_type_none
        if biz_area:
            item['BusinessArea'] = biz_area
        else:
            biz_area_none = "None"
            item['BusinessArea'] = biz_area_none

失败。 Scrapy 抱怨列表仍然超出范围。

如何在列表提取过程中进行正确的检查？

编辑：下面的完整功能。这是“链”中的最后一个功能。它在前面的步骤中访问 3 个页面并使用元传递项目。

    def trust_data(self, response):
        item = response.meta['item']
        item ['Access'] = response.xpath('//div//following-sibling::p//text()').extract()[1]
        item ['Feedback'] = response.xpath('//div//following-sibling::p//text()').extract()[2]        
        texts = response.xpath('//div//following-sibling::p//text()').get()

        if len(texts) >= 4:
           item['BusinessType'] = texts[3]
        if len(texts) >= 5:
           item['BusinessArea'] = texts[4]

        yield item

另一件事，

print(texts, 'lenght is', len(texts))
(u'5600', 'lenght is', 4)

长度 == 4，列表已完成

>>> print(texts, 'lenght is', len(texts))
(u'0', 'lenght is', 1)

长度 == 1，列表不完整（它没有我想要包含在我的项目中的标签）

但是条件

if len(texts) == 1 总是很满意，接下来我想做的任何事情都会为我的所有项目完成。示例：

        if len(texts) == 4:
           if len(texts) >= 4:
              item['BusinessType'] = texts[3]
           if len(texts) >= 5:
              item['BusinessArea'] = texts[4]
        else:
           item['BusinessType'] = "None"
           item['BusinessArea'] = "None"

这在所有可能的情况下都用“无”填充这两个项目。

【问题讨论】：

标签： xpath scrapy

【解决方案1】：

在访问索引之前，请确保对应的列表足够长：

texts = response.xpath('//div//following-sibling::p//text()').getall()
item['BusinessType'] = texts[3] if len(texts) >= 4 else 'None'
item['BusinessArea'] = texts[4] if len(texts) >= 5 else 'None'

【讨论】：

IndexError: string index out of range 我也尝试添加else 指定None if len
一个建议。列表共有 5 个项目。 [0]、[1]、[2] 始终存在，[3] 和 [4] 有时会丢失。在功能中，我查找这些项目中的每一个。如果项目 [3] 或 [4] 不存在，则将 is None 添加到响应中也会失败。
添加描述，并修复问题（len() 必须≥ 4 才能访问列表[3]）。
我试过了。超出范围的错误消失了，但是 Scrapy 从每一页中都省略了 [3] 和 [4]，而不仅仅是那些丢失的。我已经编辑了原始帖子并添加了有问题的整个功能。
那是因为你的if len(texts) == 4:。我已更新响应，请尝试此方法。