【发布时间】:2019-11-10 12:37:37
【问题描述】:
我想为每个 start_url 输出 1 个 CSV 文件。我制作了一个管道,它只输出一个包含所有 url 信息的文件,但不知道如何输出多个。
pipeline.py
class CSVPipeline(object):
def __init__(self):
self.files = {}
@classmethod
def from_crawler(cls, crawler):
pipeline = cls()
crawler.signals.connect(pipeline.spider_opened, signals.spider_opened)
crawler.signals.connect(pipeline.spider_closed, signals.spider_closed)
return pipeline
def spider_opened(self, spider):
file = open('%s_items.csv' % spider.name, 'w+b')
self.files[spider] = file
self.exporter = CsvItemExporter(file)
self.exporter.fields_to_export = ['date', 'move', 'bank', 'call', 'price']
self.exporter.start_exporting()
def spider_closed(self, spider):
self.exporter.finish_exporting()
file = self.files.pop(spider)
file.close()
print('Starting csv blank line cleaning')
with open('%s_items.csv' % spider.name, 'r') as f:
reader = csv.reader(f)
original_list = list(reader)
cleaned_list = list(filter(None,original_list))
with open('%s_items_cleaned.csv' % spider.name, 'w', newline='') as output_file:
wr = csv.writer(output_file, dialect='excel')
for data in cleaned_list:
wr.writerow(data)
def process_item(self, item, spider):
self.exporter.export_item(item)
return item
class SentimentPipeline(object):
def process_item(self, item, spider):
return item
我一直在跑步:
scrapy crawl spider -o spider.csv
我需要一个新命令吗? 对scrapy来说非常新。谢谢!
【问题讨论】:
-
该页面用于拉取网址。每次抓取 start_url 时,我都会尝试导出 CSV 文件。一次蜘蛛爬行应该为 start_url 列表中的每个 url 生成一个文件。
-
在下面查看我的答案,应该对你有用。