【发布时间】:2020-06-19 16:30:09
【问题描述】:
我写了一个scrapy bot来抓取数据,现在我想把它存储到MySQL中。我为此编写了一个管道脚本。但它不工作不知道它面临什么问题。任何人都可以提出究竟是什么问题。代码运行完美,我认为它不能只连接到 mySQL。
Scrapy 机器人代码是
from..items import NdtvItem
class NdtvReviewSpider(scrapy.Spider):
name = 'ndtv_review'
page_number = 2
start_urls = ['https://www.ndtv.com/movie-reviews/page-1'
]
def parse(self, response):
items = {}
i = 1
count = response.xpath('//*[@id="ins_storylist"]/ul/li[{}]/div[2]/div[1]/a/i/text()'.format("*")).getall()
count = len(count)
while i<=count:
outputs = NdtvItem()
outputs['page_title'] = response.xpath('//*[@id="ins_storylist"]/ul/li[{}]/div[2]/div[1]/a/i/text()'.format(i)).get()
outputs['review_content'] = response.xpath("//*[@id='ins_storylist']/ul/li[{}]/div[2]/div[3]/text()".format(i)).get()
outputs['review_link'] = response.xpath("//*[@id='ins_storylist']/ul/li[{}]/div[2]/div[1]/a/@href".format(i)).get()
i+=1
fl = 0
if outputs['page_title'] == []:
outputs['page_title'] = ''
fl+=1
if outputs['review_content'] == []:
outputs['review_content'] = ''
fl+=1
if outputs['review_link'] == []:
outputs['review_link'] = ''
fl += 1
else:
yield outputs
#if outputs['review_content'] != []:
# if 'Review:' in outputs['review_content'].split(" ") or 'review:' in outputs['review_content'].split(" ") :
# yield outputs
next_page = 'https://www.ndtv.com/movie-reviews/page-' + str(NdtvReviewSpider.page_number) +''
if NdtvReviewSpider.page_number <= 15:
NdtvReviewSpider.page_number += 1
yield response.follow(next_page, callback = self.parse)
pass
管道脚本是
# 在此处定义您的项目管道 # # 不要忘记将管道添加到 ITEM_PIPELINES 设置 # 见:https://docs.scrapy.org/en/latest/topics/item-pipeline.html 导入 mysql.connector 类 ReviewBotPipeline(对象): def __init__(self): self.create_connection() self.create_table() def create_connection(self): self.conn = mysql.connector.connect( 主机='本地主机', 用户 = 'root', passwd = 'xxxxxx', 数据库='审查' ) self.curr = self.conn.cursor() 定义创建表(自我): self.curr.execute(""" DROP TABLE IF EXISTS review_tb """) self.curr.execute(""" 创建表 review_tb( page_title 文本, review_content 文本, review_link 文本 ) """) def process_item(自我,输出,蜘蛛): self.store_db(输出) 返回输出 def store_db(自我,输出): self.curr.execute(""" 插入 review_tb 值(%s,%s,%s)""",( 输出['page_title'][0], 输出['review_content'][0], 输出['review_link'][0] )) self.conn.commit()这就是我得到的错误。
File "/Users/divyanshu/env/lib/python3.7/site-packages/twisted/internet/defer.py", line 654, in _runCallbacks
current.result = callback(current.result, *args, **kw)
File "/Users/divyanshu/env/lib/python3.7/site-packages/scrapy/utils/defer.py", line 154, in f
return deferred_from_coro(coro_f(*coro_args, **coro_kwargs))
File "/Users/divyanshu/review_bot/review_bot/pipelines.py", line 33, in process_item
self.store_db(outputs)
File "/Users/divyanshu/review_bot/review_bot/pipelines.py", line 40, in store_db
outputs['review_link']
File "/Users/divyanshu/env/lib/python3.7/site-packages/mysql/connector/cursor_cext.py", line 248, in execute
prepared = self._cnx.prepare_for_mysql(params)
File "/Users/divyanshu/env/lib/python3.7/site-packages/mysql/connector/connection_cext.py", line 626, in prepare_for_mysql
result = self._cmysql.convert_to_mysql(*params)
_mysql_connector.MySQLInterfaceError: Python type list cannot be converted
【问题讨论】:
-
嗨!在 StackOverflow 上,您最好提出精确的问题。你应该做更多的调试来找出确切的问题,然后你可以问如何解决那个确切的问题!
-
您需要包含有关您面临的错误的准确信息,不要指望我们运行它并发现问题。运行它并将错误堆栈跟踪更新为问题
-
@AzyCrw4282 我尝试调试,这是我得到的错误。
-
@RobinDeSchepper 我提供了我遇到的错误
-
error 在
store_db()中显示outputs['review_link']的问题,它说Python type list cannot be converted- 它可能意味着您在此变量中有列表,但它不知道如何处理它。你可以使用print(outputs['review_link'])和print(type( outputs['review_link'] ))来检查你在这个变量中有什么。或者你可能在你的真实代码中忘记了[0]。
标签: python mysql python-3.x scrapy web-crawler