【问题标题】:Create new request for Scrapy schedule为 Scrapy 计划创建新请求
【发布时间】:2020-06-05 08:07:48
【问题描述】:

通过 pika 我从 rabbitmq 获取 url 并尝试为 Scrapy spider 创建新请求
当我通过scrapy crawl spider 启动我的蜘蛛时,蜘蛛只是由于raise DontCloseSpider() 而没有关闭,但不要为蜘蛛创建请求
我的自定义异常:

import pika
from scrapy import signals
from scrapy.http import Request
from scrapy.exceptions import DontCloseSpider


class AddRequestExample:

    def __init__(self, stats):
        self.stats = stats

    @classmethod
    def from_crawler(cls, crawler):
        s = cls(crawler)
        crawler.signals.connect(s.spider_idle, signal=signals.spider_idle)
        return s


    def spider_idle(self, spider):
        connection = pika.BlockingConnection(pika.ConnectionParameters(host='localhost'))
        channel = connection.channel()
        try:
            url = channel.basic_get(queue='hello')[2]
            url = url.decode()
            crawler.engine.crawl(Request(url), self)
        except Exception:
            pass
        raise DontCloseSpider()

我的蜘蛛:

import scrapy

class QuotesSpider(scrapy.Spider):
    name = "spider"

    def parse(self, response):
        yield {
        'url': response.url
        }

【问题讨论】:

    标签: python scrapy rabbitmq pika


    【解决方案1】:

    看起来您正试图从这个answer 复制方法。
    在这种情况下,您需要定义请求回调函数。
    当您处理来自扩展(不是来自蜘蛛)的spider_idle 信号时 - 它应该是spider.parse 方法。

    def spider_idle(self, spider):
        ....
        try:
            url = channel.basic_get(queue='hello')[2]
            url = url.decode()
            spider.crawler.engine.crawl(Request(url=url, callback = spider.parse), self)
        except Exception:
        ....
    

    【讨论】:

      猜你喜欢
      • 2016-10-04
      • 1970-01-01
      • 2012-06-03
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2011-07-07
      相关资源
      最近更新 更多