【问题标题】:Scrapy error: 'Pipeline' object has no attribute 'exporter'Scrapy 错误:“管道”对象没有属性“导出器”
【发布时间】:2018-06-01 00:58:34
【问题描述】:

我制作了一个刮刀,并正在使用this 教程使用管道导出。当我运行scrapy crawl [myspider] 时,我看到终端中的对象闪烁,但每次运行后都会出现错误'PostPipeline' object has no attribute 'exporter'

我的蜘蛛

class FokSpider1(CrawlSpider):
name = 'fok'
allowed_domains = ['fok.nl']
start_urls = ['http://forum.fok.nl/?token=77c1f767bc31859fee1ffe041343fa48&allowcookies=ACCEPTEER+ALLE+COOKIES']

rules = (
    # My rules, leave out to save space
)

def __init__(self, *args, **kwargs):
    self.driver = webdriver.Chrome()
    super(FokSpider1, self).__init__(*args, **kwargs)

def parse_topic(self, response):
    posts = response.xpath("//div[contains(@class, 'post userid')]")

    for i, post in enumerate(posts):
        l = ItemLoader(selector=post, item=ForumTopic(), response=response)
        l.add_xpath('subforum_title',"//*[@id='pageWrapper']/div[4]/div/h2/a/text()")
        l.add_xpath('topic_title',"//*[@id='pageWrapper']/div[4]/h1/span/text()")
        l.add_xpath('unique_post_id', ".//@data-postid")
        l.add_xpath('post_rank', ".//@data-teller")
        l.add_xpath('author', ".//@data-member")
        l.add_xpath('timestamp', ".//span[contains(@class, 'post_time')]/a[1]/text()")
        l.add_xpath('content', ".//div[contains(@id, '_bot')]/div[contains(@class, 'postmain_right')]/text()")
        yield l.load_item()

settings.py

ITEM_PIPELINES = {
   'scrapy_spider.pipelines.PostPipeline': 300,
}

我认为 settings.py 的其余部分不相关?

pipelines.py

from scrapy.exceptions import DropItem
from scrapy import signals
from scrapy.exporters import XmlItemExporter


class PostPipeline(object):
    def __init__(self):
        self.ids_seen = set()
        self.files = {}

    @classmethod
    def from_crawler(cls, crawler):
        pipeline = cls()
        crawler.signals.connect(pipeline.spider_opened, signals.spider_opened)
        crawler.signals.connect(pipeline.spider_closed, signals.spider_closed)
        return pipeline

    def spider_opened(self, spider):
        file = open('fokSpider1.xml' % spider.name, 'w+b')
        self.files[spider] = file
        self.exporter = XmlItemExporter(file)
        self.exporter.start_exporting()

    def spider_closed(self, spider):
        self.exporter.finish_exporting()
        file = self.files.pop(spider)
        file.close()

    def process_item(self, item, spider):
        if item['unique_post_id'] in self.ids_seen:
            raise DropItem("Duplicate item found: %s" % item)
        else:
            self.ids_seen.add(item['unique_post_id'])
            self.exporter.export_item(item)
            return item

其他信息

process_item 方法引发错误:

    2017-12-18 17:24:00 [scrapy.core.scraper] ERROR: Error processing {'author': u'HaverMoutKoekje',
 'content': u'Here is the content',
 'post_rank': u'7',
 'subforum_title': u'televisie',
 'timestamp': u'vrijdag 8 december 2017 @ 21:59',
 'unique_post_id': u'175586521'}
**Traceback (most recent call last):
  File "/anaconda/lib/python2.7/site-packages/twisted/internet/defer.py", line 653, in _runCallbacks
    current.result = callback(current.result, *args, **kw)
  File "/Users/my.name/scrapy/scrapy_spider/scrapy_spider/pipelines.py", line 40, in process_item
    self.exporter.export_item(item)
AttributeError: 'PostPipeline' object has no attribute 'exporter'**

当我运行scrapy crawl [myspider] -o somefile.xml 时,文件已创建,但它没有内容。

here 相同的错误,但那里没有答案...

非常感谢任何帮助!

更新: 虽然没有真正解决这个问题,但我现在至少可以使用简单的管道导出数据

from scrapy.exceptions import DropItem

class PostPipeline(object):
    def __init__(self):
        self.ids_seen = set()
        self.files = {}

    def process_item(self, item, spider):
        if item['unique_post_id'] in self.ids_seen:
            raise DropItem("Duplicate item found: %s" % item)
        else:
            self.ids_seen.add(item['unique_post_id'])
            return item

使用命令scrapy crawl [myspider] -o somefile.xml。为什么直接从教程中获取的早期方法不起作用,仍然不知道。

【问题讨论】:

  • 也许我只是想念它,但是你在哪里调用 spider_opened 函数?
  • 查看我的 pipelines.py 部分,它位于 from_crawlerspider_closed 之间。
  • 不是在哪里定义,在哪里调用。

标签: python scrapy-spider scrapy-pipeline


【解决方案1】:

file = open('fokSpider1.xml' % spider.name, 'w+b')

这看起来不对。但是为什么它没有引发错误?

无论如何,可能是这个原因,你的exporter参数不正确,所以没有初始化。

【讨论】:

    猜你喜欢
    • 2015-09-17
    • 1970-01-01
    • 1970-01-01
    • 2021-08-06
    • 2019-08-14
    • 1970-01-01
    • 2019-01-10
    • 2019-09-25
    • 2019-11-24
    相关资源
    最近更新 更多