【问题标题】:Change Scrapy JSON Output更改 Scrapy JSON 输出
【发布时间】:2016-01-21 06:49:48
【问题描述】:

我正在使用 Scrapy 从管道中的蜘蛛导出 JSON。我想将 json 包装在产品对象中。

我正在使用 JsonLinesItemExporter

目前,我的 JSON 如下所示:

{"name": "Protective iPhone Stand Case",
    "link": "https://things.com/899029978367138670/Strap-On-SoftRack-Roof-Rack-by-Otium",
    "category_old": "Sports & Outdoors",
    "image_url": "https://thingd-media-ec1.com/default/899029978367138670_42120cf10765.jpg",
    "price": "160",
    "interest": "13",
    "company": "ACME",
    "country": "USA"}

"product": {
    "name": "Protective iPhone Stand Case",
    "link": "https://things.com/899029978367138670/Strap-On-SoftRack-Roof-Rack-by-Otium",
    "category_old": "Sports & Outdoors",
    "image_url": "https://thingd-media-ec1.com/default/899029978367138670_42120cf10765.jpg",
    "price": "160",
    "interest": "13",
    "company": "ACME",
    "country": "USA"
}

那么如何将它包装在 Product 对象中?

这是我的管道代码:

import requests
import time
from scrapy.utils.project import get_project_settings
import sys
import json
from scrapy import signals
from scrapy.exporters import JsonLinesItemExporter

SETTINGS = get_project_settings()

class FancyPipeline(object):

  def __init__(self):
        #Instantiate API Connection
        self.files = {}
        url = 'http://unshakable-missile-106309.nitrousapp.com:3000/api/v1/imports'

  @classmethod
  def from_crawler(cls, crawler):
        pipeline = cls()
        crawler.signals.connect(pipeline.spider_opened, signals.spider_opened)
        crawler.signals.connect(pipeline.spider_closed, signals.spider_closed)
        return pipeline

  def spider_opened(self, spider):
        #open a static/dynamic file to read and write to
        file = open('%s_items.json' % spider.name, 'w+b')
        self.files[spider] = file
        self.exporter = JsonLinesItemExporter(file)
        self.exporter.start_exporting()

  def spider_closed(self, spider):
        self.exporter.finish_exporting()
        file = self.files.pop(spider)
        file.close()

  def process_item(self, item, spider):
        self.exporter.export_item(item)
        return item

【问题讨论】:

    标签: python json scrapy


    【解决方案1】:

    我可以使用以下代码做到这一点:

    def spider_opened(self, spider):
            #open a static/dynamic file to read and write to
            file = open('%s_items.json' % spider.name, 'w+b')
            self.files[spider] = file
            file.write('''{
        "product": [''')
            self.exporter = JsonLinesItemExporter(file)
            self.exporter.start_exporting()
    
      def spider_closed(self, spider):
            self.exporter.finish_exporting()
            file = self.files.pop(spider)
            file.write("]}")
            file.close()
    

    【讨论】:

      猜你喜欢
      • 2012-10-10
      • 2013-04-01
      • 1970-01-01
      • 2015-09-06
      • 1970-01-01
      • 2018-12-15
      • 1970-01-01
      • 2015-12-28
      • 2019-10-12
      相关资源
      最近更新 更多