使用 fast_executemany 插入 pandas DataFrame 时出现 utf_16_le_decode SystemError答案

【问题标题】：utf_16_le_decode SystemError when inserting pandas DataFrame with fast_executemany使用 fast_executemany 插入 pandas DataFrame 时出现 utf_16_le_decode SystemError
【发布时间】：2018-08-02 07:33:03
【问题描述】：

这是我的代码：

def insertDataFrameInDB(cursor, dataFrame, toTable, fieldNames = None):
    if fieldNames:
        dataFrame = dataFrame[fieldNames]
    else:
        fieldNames = dataFrame.columns

    for r in dataFrame.columns.values:
        dataFrame[r] = dataFrame[r].map(str)
        dataFrame[r] = dataFrame[r].map(str.strip)   
    params = [tuple(x) for x in dataFrame.values]

    fieldNameStr = ",".join(fieldNames)
    valueStr = ",".join(["?"] * len(fieldNames))
    sql = "INSERT INTO {} ({}) VALUES({})".format(toTable, fieldNameStr, valueStr)
    cursor.fast_executemany = True
    cursor.executemany(sql, params)
    cursor.commit()


insertDataFrameInDB(cursor, df, "table")

它给出了以下我真的无法解决的错误：

DataError                                 Traceback (most recent call last)
DataError: ('String data, right truncation: length 24 buffer 20', '22001')

The above exception was the direct cause of the following exception:

SystemError                               Traceback (most recent call last)
~\AppData\Local\Continuum\anaconda3\lib\encodings\utf_16_le.py in decode(input, errors)
     15 def decode(input, errors='strict'):
---> 16     return codecs.utf_16_le_decode(input, errors, True)
     17 

SystemError: <built-in function utf_16_le_decode> returned a result with an error set

The above exception was the direct cause of the following exception:

SystemError                               Traceback (most recent call last)
SystemError: decoding with 'utf-16le' codec failed (SystemError: <built-in function utf_16_le_decode> returned a result with an error set)

The above exception was the direct cause of the following exception:

SystemError                               Traceback (most recent call last)
~\AppData\Local\Continuum\anaconda3\lib\encodings\utf_16_le.py in decode(input, errors)
     15 def decode(input, errors='strict'):
---> 16     return codecs.utf_16_le_decode(input, errors, True)
     17 

SystemError: <built-in function utf_16_le_decode> returned a result with an error set

The above exception was the direct cause of the following exception:

SystemError                               Traceback (most recent call last)
SystemError: decoding with 'utf-16le' codec failed (SystemError: <built-in function utf_16_le_decode> returned a result with an error set)

The above exception was the direct cause of the following exception:

SystemError                               Traceback (most recent call last)
<ipython-input-6-f73d9346f943> in <module>()
     12 
     13 cursor = getCursor(conData)
---> 14 insertDataFrameInDB(cursor, df, "snowplow.sankey")

<ipython-input-1-69ecbca20fc8> in insertDataFrameInDB(cursor, dataFrame, toTable, fieldNames)
     29     sql = "INSERT INTO {} ({}) VALUES({})".format(toTable, fieldNameStr, valueStr)
     30     cursor.fast_executemany = True
---> 31     cursor.executemany(sql, params)
     32     cursor.commit()
SystemError: <class 'pyodbc.Error'> returned a result with an error set

很多错误搜索让我觉得这与缺少 BOM 有关，我尝试解码“params”元组中的字符串，还尝试了 str.astype('U')。有谁知道导致问题的原因以及可能的解决方法？

【问题讨论】：

如果您使用的是 pyodbc 4.0.22，请尝试降级到 4.0.21 (pip install pyodbc==4.0.21) 看看是否有帮助。
感谢您的输入，但遗憾的是错误仍然存在
如果您使用cursor.fast_executemany = False ...，错误是否仍然存在？另外，您使用的是什么 ODBC 驱动程序？
通过设置 cursor.fast_executemany = False 解决，谢谢！但是我有很多数据，我用 True 读取它的执行速度大约快 20 倍。没有办法仍然使用 fast_executemany 吗？
fast_executemany 可能与所有 ODBC 驱动程序不兼容。您使用的是哪个驱动程序？

标签： pandas sqlalchemy pyodbc

【解决方案1】：

您正在使用 Microsoft 的“ODBC Driver ... for SQL Server”，因此fast_executemany 应该可以与 pyodbc 4.0.21 一起使用。但是，您可以在仍然使用 DataFrame#to_sql 的同时调用该功能，方法是使用 SQLAlchemy 执行事件，如 this question 所示。

示例：以下代码没有利用fast_executemany

import pandas as pd
from sqlalchemy import create_engine
import time

engine = create_engine('mssql+pyodbc://@SQL_panorama')

# test environment
num_rows = 1000
df = pd.DataFrame(
    [[x, f'row{x:03}'] for x in range(num_rows)],
    columns=['id', 'txt']
)
#
cnxn = engine.connect()
try:
    cnxn.execute("DROP TABLE df_to_sql_test")
except:
    pass
cnxn.execute("CREATE TABLE df_to_sql_test (id INT PRIMARY KEY, txt NVARCHAR(50))")

# timing test
t0 = time.time()
df.to_sql("df_to_sql_test", engine, if_exists='append', index=False)
print(f"{num_rows} rows written in {(time.time() - t0):.1f} seconds")

结果：

1000 rows written in 25.2 seconds

添加 SQLAlchemy 执行事件处理程序可显着减少执行时间

import pandas as pd
from sqlalchemy import create_engine, event
import time

engine = create_engine('mssql+pyodbc://@SQL_panorama')

@event.listens_for(engine, 'before_cursor_execute')
def receive_before_cursor_execute(conn, cursor, statement, params, context, executemany):
    if executemany:
        cursor.fast_executemany = True


# test environment
num_rows = 1000
df = pd.DataFrame(
    [[x, f'row{x:03}'] for x in range(num_rows)],
    columns=['id', 'txt']
)
#
cnxn = engine.connect()
try:
    cnxn.execute("DROP TABLE df_to_sql_test")
except:
    pass
cnxn.execute("CREATE TABLE df_to_sql_test (id INT PRIMARY KEY, txt NVARCHAR(50))")

# timing test
t0 = time.time()
df.to_sql("df_to_sql_test", engine, if_exists='append', index=False)
print(f"{num_rows} rows written in {(time.time() - t0):.1f} seconds")

结果：

1000 rows written in 1.6 seconds

有关此方法的更完整讨论，请参阅this answer。

【讨论】：