Pandas UnicodeDecodeError 使用 read_sql答案

【问题标题】：Pandas UnicodeDecodeError while using read_sqlPandas UnicodeDecodeError 使用 read_sql
【发布时间】：2018-03-15 22:22:00
【问题描述】：

我正在尝试使用 pandas.read_sql 执行 SQL 查询。它通常可以工作，但对于某些查询我遇到了这个错误：

  File "C:\Anaconda3\lib\site-packages\pandas\io\sql.py", line 1454, in _fetchall_as_list
    result = cur.fetchall()

    UnicodeDecodeError: 'ascii' codec can't decode byte 0xb4 in position 3: ordinal not in range(128)

我在这里尝试了针对非常相似的问题 (UnicodeDecodeError with pandas.read_sql) 建议的解决方案，但它并没有解决问题。

我正在使用 cx_oracle 库进行数据库连接。

我试过了

db = cx_Oracle.connect(user,pwd, dsn_dict[dbname],encoding='utf-8')

但是当我使用

检查编码时

print(db.encoding)
print(db.nencoding)

我总是得到

ASCII
ASCII

我尝试使用

更改 NLS_LANG

os.environ['NLS_LANG'] = 'AMERICAN_AMERICA.US7ASCII'

但它会导致相同的错误

这些是数据库 NLS 参数：

NLS_CHARACTERSET    US7ASCII

NLS_NCHAR_CHARACTERSET  AL16UTF16

我在 access 中运行了相同的查询，我注意到查询结果中有这个字符，这可能导致了这个问题：

¿

基本上，我不知道如何设置正确的编码来处理这个问题。任何帮助表示赞赏。谢谢。

解决方案：

作为参考，我通过设置解决了这个问题

os.environ['NLS_LANG'] = 'AMERICAN_AMERICA.UTF8'

我不喜欢这样做。更好的解决方案表示赞赏。

【问题讨论】：

标签： python oracle pandas cx-oracle

【解决方案1】：

使用 cx_Oracle 6 这应该适合您：

cx_Oracle.connect("user/pw@dsn", encoding = "UTF-8", nencoding = "UTF-8")

由于您的数据库编码是 ASCII，您甚至可以设置 nencoding 参数。如果您要使用 NLS_LANG 环境变量，请确保您使用的是真正的 UTF-8 编码。这在 Oracle 中称为 AL32UTF8 —— 由于历史原因！

【讨论】：