Scrapy：跳过项目并继续执行答案

【问题标题】：Scrapy: skip item and continue with exectuionScrapy：跳过项目并继续执行
【发布时间】：2011-02-18 10:23:09
【问题描述】：

我正在做一个 RSS 蜘蛛。我想继续执行如果当前节点没有匹配，蜘蛛会忽略当前节点项目...到目前为止，我得到了这个：

        if info.startswith('Foo'):
            item['foo'] = info.split(':')[1]
        else:
            return None

(info 是一个在...之前从 xpath 清理过的字符串)

但我遇到了这个异常：

    exceptions.TypeError: You cannot return an "NoneType" object from a

蜘蛛

那么我怎样才能忽略这个节点并继续执行呢？

【问题讨论】：

标签： python web-crawler scrapy

【解决方案1】：

parse(response):
    #make some manipulations
    if info.startswith('Foo'):
            item['foo'] = info.split(':')[1]
            return [item]
        else:
            return []

但最好不要使用 return，使用 yield 或什么都不做

parse(response):
    #make some manipulations
    if info.startswith('Foo'):
            item['foo'] = info.split(':')[1]
            yield item
        else:
            return

【讨论】：

【解决方案2】：

当我不得不在解析期间但在回调函数之外跳过该项目时，我发现了一个未记录的方法。

在解析过程中的任何地方只需提高StopIteration。

class MySpider(Spider):
    def parse(self, response):
        value1 = parse_something1()
        value2 = parse_something1()
        yield Item(value1, value2)

    def parse_something1(self):
        try:
            return get_some_value()
        except Exception:
            self.skip_item()

    def parse_something2(self):
        if something_wrong:
            self.skip_item()

    def skip_item(self):
        raise StopIteration

【讨论】：

未记录的方法可能会改变行为并停止工作，恕不另行通知