【发布时间】:2021-02-08 14:03:57
【问题描述】:
我在抓取图像时遇到了一些问题。
Settings.py:
ITEM_PIPELINES = {
'scrapy.contrib.pipeline.images.FilesPipeline': 1,
}
FILES_STORE = 'D:/0. Documentos/10. GitHub/0. 93Pipe/RealEstatePredictor/Images'
Items.py
导入scrapy
class MagazineCover(scrapy.Item):
title = scrapy.Field()
pubDate = scrapy.Field()
file_urls = scrapy.Field()
files = scrapy.Field()
Scrapy spider:
imgs = response.css('div.gallery-content.item-gallery__wrapper img::attr(src)').getall()
for img in imgs:
yield MagazineCover(title="title", pubDate="02/03/2021", file_urls=[img])
蜘蛛读取了网址,但是我在Settings.py中设置的文件夹是空的
结果
2021-02-04 15:24:17 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.portalinmobiliario.com/venta/departamento/conchali-metropolitana/6055529-julio-montt-salamanca-5935-uda>
{'file_urls': ['https://http2.mlstatic.com/julio-montt-salamanca-5935-D_NQ_NP_2X_760548-MLC44703096172_012021-R.jpg'],
'pubDate': '02/03/2021',
'title': 'title'}
感谢您的帮助!!
【问题讨论】:
-
这是stackoverflow.com/q/66053830/939364的副本,请删除其中一个问题。
标签: python web-scraping scrapy