UnicodeEncodeError：“ascii”编解码器无法编码答案

【问题标题】：UnicodeEncodeError: 'ascii' codec can't encodeUnicodeEncodeError：“ascii”编解码器无法编码
【发布时间】：2016-11-21 20:53:46
【问题描述】：

我有以下不断更新的数据容器：

  data = []
        for val, track_id in zip(values,list(track_ids)):
            #below
            if val < threshold:
                #structure data as dictionary
                pre_data = {"artist": sp.track(track_id)['artists'][0]['name'], "track":sp.track(track_id)['name'], "feature": filter_name, "value": val}
                data.append(pre_data)
        #write to file
        with open('db/json/' + user + '_' + product + '_' + filter_name + '.json', 'w') as f:
            json.dump(data,f, ensure_ascii=False, indent=4, sort_keys=True)

但我遇到了很多这样的错误：

json.dump(data,f, ensure_ascii=False, indent=4, sort_keys=True) File"/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/__init__.py", line 190, in dump fp.write(chunk) UnicodeEncodeError: 'ascii' codec can't encode character u'\u2019' in position 6: ordinal not in range(128)

有没有办法一劳永逸地解决这个编码问题？

有人告诉我这样做可以：

import sys
reload(sys)
sys.setdefaultencoding('utf-8')

但是很多人不推荐。

我用python 2.7.10

有什么线索吗？

【问题讨论】：

显示完整的错误跟踪，以便我们了解错误的来源。这是 Python 2 还是 3？
sys.setdefaultencoding 可能在 Python2 中工作，但在 Python3 中不存在。它可以与print() 一起使用，但不能与写入文件等其他内容一起使用，因此您必须显示完整的错误消息和导致问题的行。
@MarkRansom 已更新，谢谢
@furas 上面的完整错误

标签： python encoding utf-8

【解决方案1】：

当您写入以文本模式打开的文件时，Python 会为您编码字符串。默认编码是ascii，会产生你看到的错误；有很多个字符无法编码为 ASCII。

解决方案是以不同的编码打开文件。在 Python 2 中您必须使用 codecs 模块，在 Python 3 中您可以将 encoding= 参数直接添加到 open。 utf-8 是一个流行的选择，因为它可以处理所有的 Unicode 字符，特别是对于 JSON，它是标准；见https://en.wikipedia.org/wiki/JSON#Data_portability_issues。

import codecs
with codecs.open('db/json/' + user + '_' + product + '_' + filter_name + '.json', 'w', encoding='utf-8') as f:

【讨论】：

你打败了我！ RFC 只允许 utf-8、utf-16 和 utf-32 编码，但对后两个（例如没有 BOM）施加了限制，并暗示 utf-8 是唯一可互操作的方式。 mbcs 会违反 rfc。我认为 JSON 只是 utf-8 并且惊讶于其他编码甚至被允许。
@tdelaney 我从来没有直接处理过 JSON，所以我不知道字符集限制，谢谢！我会编辑答案。

【解决方案2】：

您的对象具有 unicode 字符串，python 2.x 对 unicode 的支持可能有点参差不齐。首先，让我们做一个简短的例子来演示这个问题：

>>> obj = {"artist":u"Björk"}
>>> import json
>>> with open('deleteme', 'w') as f:
...     json.dump(obj, f, ensure_ascii=False)
... 
Traceback (most recent call last):
  File "<stdin>", line 2, in <module>
  File "/usr/lib/python2.7/json/__init__.py", line 190, in dump
    fp.write(chunk)
UnicodeEncodeError: 'ascii' codec can't encode character u'\xf6' in position 3: ordinal not in range(128)

来自json.dump 帮助文本：

If ``ensure_ascii`` is true (the default), all non-ASCII characters in the
output are escaped with ``\uXXXX`` sequences, and the result is a ``str``
instance consisting of ASCII characters only.  If ``ensure_ascii`` is
``False``, some chunks written to ``fp`` may be ``unicode`` instances.
This usually happens because the input contains unicode strings or the
``encoding`` parameter is used. Unless ``fp.write()`` explicitly
understands ``unicode`` (as in ``codecs.getwriter``) this is likely to
cause an error.

啊！有解决办法。要么使用默认的 ensure_ascii=True 并获取 ascii 转义的 unicode 字符，要么使用 codecs 模块以您想要的编码打开文件。这有效：

>>> import codecs
>>> with codecs.open('deleteme', 'w', encoding='utf-8') as f:
...     json.dump(obj, f, ensure_ascii=False)
... 
>>>

【讨论】：

【解决方案3】：

为什么不编码特定的字符串呢？尝试在引发异常的字符串上使用.encode('utf-8') 方法。

【讨论】：