【发布时间】:2020-07-13 15:54:20
【问题描述】:
我正在尝试抓取数据并将其存储在 postgresql 数据库中。该代码正确提取数据并将其存储在文件中,但未显示在数据库的表中。我要抓取的 URL 是- https://www.gizbot.com/mobile-brands-in-india/'
蜘蛛.py
class MobilesSpider(scrapy.Spider):
name = "mobiles"
def start_requests(self):
urls = [
'https://www.gizbot.com/mobile-brands-in-india/',
]
for url in urls:
yield scrapy.Request(url=url, callback=self.parse)
def parse(self, response):
page = response.url.split("/")[-2]
filename = 'mobiles-%s.html' % page
mob = response.xpath('.//div[has-class("all-brands-block-desc-brand")]/text()').getall()
im = list(response.css('div.all-brands-block'))
res = dict(zip(mob, im))
for mobile, i in res.items():
with open(filename, 'a') as f:
f.write("%s %s\n" % (mobile, "gizbot.com"+str(i.css('img::attr("data-pagespeed-lazy-src")').extract())))
self.log('Saved file %s' % filename)
管道.py
class MobilesPipeline:
def open_spider(self, spider):
hostname = 'localhost'
username = 'postgres'
password = 'postgres' # your password
database = 'postgres'
self.connection = psycopg2.connect(host=hostname, user=username, password=password, dbname=database)
self.cur = self.connection.cursor()
def close_spider(self, spider):
self.cur.close()
self.connection.close()
def process_item(self, item, spider):
self.cur.execute("insert into mobiles(name,image) values(%s,%s)", (item['name'], item['image']))
self.connection.commit()
return item
我需要在数据库中存储品牌名称和图片来源。
【问题讨论】:
标签: python-3.x web-scraping scrapy