scrapy项目数据未存储在数据库postgresql中答案

【问题标题】：scrapy item data not storing in databse postgresqlscrapy项目数据未存储在数据库postgresql中
【发布时间】：2020-07-13 15:54:20
【问题描述】：

我正在尝试抓取数据并将其存储在 postgresql 数据库中。该代码正确提取数据并将其存储在文件中，但未显示在数据库的表中。我要抓取的 URL 是- https://www.gizbot.com/mobile-brands-in-india/'

蜘蛛.py

class MobilesSpider(scrapy.Spider):
    name = "mobiles"

    def start_requests(self):
        urls = [
            'https://www.gizbot.com/mobile-brands-in-india/',
            
        ]
        for url in urls:
            yield scrapy.Request(url=url, callback=self.parse)

    def parse(self, response):
        page = response.url.split("/")[-2]
        filename = 'mobiles-%s.html' % page
        mob = response.xpath('.//div[has-class("all-brands-block-desc-brand")]/text()').getall()
        im = list(response.css('div.all-brands-block'))
        res = dict(zip(mob, im))
        for mobile, i in res.items():

            with open(filename, 'a') as f:
                f.write("%s %s\n" % (mobile, "gizbot.com"+str(i.css('img::attr("data-pagespeed-lazy-src")').extract())))
            self.log('Saved file %s' % filename)

管道.py

class MobilesPipeline:

    def open_spider(self, spider):
        hostname = 'localhost'
        username = 'postgres'
        password = 'postgres'  # your password
        database = 'postgres'
        self.connection = psycopg2.connect(host=hostname, user=username, password=password, dbname=database)
        self.cur = self.connection.cursor()

    def close_spider(self, spider):
        self.cur.close()
        self.connection.close()

    def process_item(self, item, spider):
        self.cur.execute("insert into mobiles(name,image) values(%s,%s)", (item['name'], item['image']))
        self.connection.commit()
        return item

我需要在数据库中存储品牌名称和图片来源。

【问题讨论】：

标签： python-3.x web-scraping scrapy

【解决方案1】：

根据您提供的信息，您可能尚未在蜘蛛设置中启用管道。要调用 process_item 方法，管道必须处于活动状态，为此您需要将以下配置添加到您的 settings.py

ITEM_PIPELINES = {
    'myproject.pipelines.MobilesPipeline': 300,
}

请注意，您需要将 myproject. 替换为您的项目名称。
整数确定管道运行的顺序，从低值类到高值类。

如果您需要更多信息，这里是docs。

【讨论】：

我已经在我的settings.py中启用了，还是不行。
请更新您的问题以包含设置的相关部分，因为它们与问题相关。到目前为止的信息，我同意@renatodvc 的猜测。