SCRAPY：如何将数据存储到 Mysql 数据库中答案

【问题标题】：SCRAPY: how to store data into Mysql databaseSCRAPY：如何将数据存储到 Mysql 数据库中
【发布时间】：2015-09-28 16:56:34
【问题描述】：

我在尝试将数据存储到 mysql 数据库时遇到了 scrapy 的问题：我收到以下错误：(screenshot here)

我在 pipelines.py 中的代码是

class SQLStorePipeline(object):

    def __init__(self):
        self.dbpool = adbapi.ConnectionPool('localhost', db='python',
                user='root', passwd='', cursorclass=MySQLdb.cursors.DictCursor,
                charset='utf8', use_unicode=True)

    def process_item(self, item, spider):
        # run db query in thread pool
        query = self.dbpool.runInteraction(self._conditional_insert, item)
        query.addErrback(self.handle_error)

        return item

    def _conditional_insert(self, tx, item):
        # create record if doesn't exist. 
        # all this block run on it's own thread
        tx.execute("select * from test where name = %s", (item['name'][0], ))
        result = tx.fetchone()
        if result:
            log.msg("Item already stored in db: %s" % item, level=log.DEBUG)
        else:
            tx.execute(\
                "insert into test (name, price) "
                "values (%s, %s)",
                (item['link'][0],
                 datetime.datetime.now())
            )
            log.msg("Item stored in db: %s" % item, level=log.DEBUG)

    def handle_error(self, e):
        log.err(e)

（我是从here那里得到的）。

而我的解析类是：

def parse(self, response):
    item = DmozItem()
    item['name'] = response.xpath('//meta[@itemprop="name"]/@content').extract()[0]
    item['price'] = response.xpath('//meta[@itemprop="price"]/@content').extract()[0]
    yield item

我知道这个问题已经被问过了，但我在问这里之前尝试了所有不同的答案，但它们都不起作用......

有人可以帮助我吗？提前谢谢！

【问题讨论】：

根据您的屏幕截图，您有缩进问题。检查你的空间。

标签： python mysql web-scraping scrapy

【解决方案1】：

我找到了解决方案。其实@alecxe 是对的，他的话让我找到了解决方案。

MySQLdb 根本没有安装，原因是它安装失败，因为我的名字中有重音，Python 无法处理路径。

再次感谢@alecxe！

【讨论】：

【解决方案2】：

仔细阅读错误 - 它在以下行显示IndentationError：

yield item

这意味着您需要检查缩进是否一致（每个缩进 4 个空格）：

def parse(self, response):
    item = DmozItem()
    item['name'] = response.xpath('//meta[@itemprop="name"]/@content').extract()[0]
    item['price'] = response.xpath('//meta[@itemprop="price"]/@content').extract()[0]
    yield item

如果是这种情况，请不要混合制表符和空格。

【讨论】：

非常感谢您的回答！我修复了它，但不幸的是这不是主要问题，因为当我删除 pipelines.py 时，爬虫正在工作。这是不工作的管道，我真的不明白为什么......
@galopin 好的，你现在遇到了什么错误？谢谢。
谢谢@alecxe！我收到了这些错误：dropbox.com/s/ct4q5zzqzbsbfn6/…
@galopin 好的，这可能是由于多种原因。您能否简要介绍一下您当前完整的pipelines.py 和settings.py (gist.github.com)？
这里是：gist.github.com/anonymous/537d10edd821e7bef92a！非常感谢，再次