【发布时间】:2014-09-25 13:01:36
【问题描述】:
嵌套Item数据的正确方法是什么?
例如,我想要一个产品的输出:
{
'price': price,
'title': title,
'meta': {
'url': url,
'added_on': added_on
}
我有scrapy.Item of:
class ProductItem(scrapy.Item):
url = scrapy.Field(output_processor=TakeFirst())
price = scrapy.Field(output_processor=TakeFirst())
title = scrapy.Field(output_processor=TakeFirst())
url = scrapy.Field(output_processor=TakeFirst())
added_on = scrapy.Field(output_processor=TakeFirst())
现在,我的做法是根据新项目模板重新格式化管道中的整个项目:
class FormatedItem(scrapy.Item):
title = scrapy.Field()
price = scrapy.Field()
meta = scrapy.Field()
正在筹备中:
def process_item(self, item, spider):
formated_item = FormatedItem()
formated_item['title'] = item['title']
formated_item['price'] = item['price']
formated_item['meta'] = {
'url': item['url'],
'added_on': item['added_on']
}
return formated_item
这是解决这个问题的正确方法,还是有更直接的方法来解决这个问题而不破坏框架的理念?
【问题讨论】:
标签: scrapy