【发布时间】:2019-01-22 12:28:43
【问题描述】:
我有一个脚本可以使用 executemany() 在表中插入一个 DataFrame。
问题是这个表有一个ID为Primary Key,有时会出现插入具有相同ID的行。
我想知道是否有一种简单的方法来处理这种异常并继续执行executemany()。
我正在考虑的替代方法是检查表中 DataFrame 的所有 ID,并在插入数据库之前将其删除...但我不知道这是否会表现出色...
我的代码:
params = (tuple(row) for _, row in df.iterrows())
sql = '''INSERT INTO stilingue.stalker_comments values(?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?)'''
start = time.time()
try:
self.cursor.executemany(sql, params)
self.conn.commit()
except Exception as e:
print(e)
self.conn.rollback()
print('Something went wrong...')
end = time.time()
print('Execution time: {0:.2f} seconds.'.format(end-start))
数据帧:
channel followers gender hashtags interactions likes location mentions name page_comment ... text themes uid user_image_url user_url username verified videoplays business rt_count
0 Inbox do Facebook 0 Não Definido 0 0 Midiam Mendes False ... Sacanagem isso né?? Poorq vocês dizeram que o ... 1995608377159933 https://storage.googleapis.com/usersstilingue/... False 0 Itaú 0
1 Inbox do Facebook 0 Não Definido 0 0 Midiam Mendes False ... Eu tenho provas , e posso processar vocês!! 1995608377159933 https://storage.googleapis.com/usersstilingue/... False 0 Itaú 0
2 Inbox do Facebook 0 Não Definido 0 0 Midiam Mendes False ... Isso é um absurdo 1995608377159933 https://storage.googleapis.com/usersstilingue/... False 0 Itaú 0
追溯:
('23000', "[23000] [Microsoft][ODBC SQL Server Driver][SQL Server]Violation of PRIMARY KEY constraint 'PK__stalker___DD37D91A4691B0F7'. Cannot insert duplicate key in object 'stilingue.stalker_comments'. The duplicate key value is (m__g64-pbys7OlEvp8xmfyktlNIHrUPQPiNrcKrPVOF_Lj84OJfN4WtAJ92lj7YnzAOQ1B7EDCJf85k_UcwB0-4Q). (2627) (SQLExecDirectW); [23000] [Microsoft][ODBC SQL Server Driver][SQL Server]The statement has been terminated. (3621)")
【问题讨论】:
-
一种替代方法可能是使用
df.to_sql将行上传到临时表,然后使用SQLMERGE语句将行插入到主表中。 -
向我们展示回溯和数据框
-
@StevenG 我编辑了这篇文章。 @Gord Thompson,您认为
df.to_sql是否比 executemany() 更快,即使使用fast_executemany = True? -
to_sql可以利用fast_executemany;详情here. -
@GordThompson 谢谢!
标签: python pandas dataframe pyodbc