【问题标题】:UnicodeDecodeError while using json.dumps() [duplicate]使用 json.dumps() 时出现 UnicodeDecodeError [重复]
【发布时间】:2013-11-21 06:32:30
【问题描述】:

我的 python 列表中有如下字符串(取自命令提示符):

>>> o['records'][5790]
(5790, 'Vlv-Gate-Assy-Mdl-\xe1M1-2-\xe19/16-10K-BB Credit Memo            ', 60,
 True, '40141613')
>>>

我已经尝试过这里提到的建议:Changing default encoding of Python?

进一步将默认编码更改为 utf-16。但还是json.dumps()抛出异常如下:

>>> write(o)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "okapi_create_master.py", line 49, in write
    o = json.dumps(output)
  File "C:\Python27\lib\json\__init__.py", line 231, in dumps
    return _default_encoder.encode(obj)
  File "C:\Python27\lib\json\encoder.py", line 201, in encode
    chunks = self.iterencode(o, _one_shot=True)
  File "C:\Python27\lib\json\encoder.py", line 264, in iterencode
    return _iterencode(o, 0)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xe1 in position 25: invalid
continuation byte

无法确定此类字符串需要什么样的转换才能使json.dumps() 起作用。

【问题讨论】:

标签: python json python-2.7 unicode character-encoding


【解决方案1】:

\xe1 无法使用 utf-8、utf-16 编码解码。

>>> '\xe1'.decode('utf-8')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Python27\lib\encodings\utf_8.py", line 16, in decode
    return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xe1 in position 0: unexpected end of data
>>> '\xe1'.decode('utf-16')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Python27\lib\encodings\utf_16.py", line 16, in decode
    return codecs.utf_16_decode(input, errors, True)
UnicodeDecodeError: 'utf16' codec can't decode byte 0xe1 in position 0: truncated data

尝试 latin-1 编码:

>>> record = (5790, 'Vlv-Gate-Assy-Mdl-\xe1M1-2-\xe19/16-10K-BB Credit Memo            ',
...           60, True, '40141613')
>>> json.dumps(record, encoding='latin1')
'[5790, "Vlv-Gate-Assy-Mdl-\\u00e1M1-2-\\u00e19/16-10K-BB Credit Memo            ", 60, true, "40141613"]'

或者,指定ensure_ascii=Falsejson.dumps 使json.dumps 不尝试解码字符串。

>>> json.dumps(record, ensure_ascii=False)
'[5790, "Vlv-Gate-Assy-Mdl-\xe1M1-2-\xe19/16-10K-BB Credit Memo            ", 60, true, "40141613"]'

【讨论】:

  • 使用ensure_ascii=False消除错误的最快方法。
【解决方案2】:

我遇到了类似的问题,并想出了以下方法来保证来自任一输入的 unicode 或字节字符串。简而言之,include and use 以下 lambdas:

# guarantee unicode string
_u = lambda t: t.decode('UTF-8', 'replace') if isinstance(t, str) else t
_uu = lambda *tt: tuple(_u(t) for t in tt) 
# guarantee byte string in UTF8 encoding
_u8 = lambda t: t.encode('UTF-8', 'replace') if isinstance(t, unicode) else t
_uu8 = lambda *tt: tuple(_u8(t) for t in tt)

适用于您的问题:

import json
o = (5790, u"Vlv-Gate-Assy-Mdl-\xe1M1-2-\xe19/16-10K-BB Credit Memo            ", 60,
 True, '40141613')
as_json = json.dumps(_uu8(*o))
as_obj = json.loads(as_json)
print "object\n ", o
print "json (type %s)\n %s " % (type(as_json), as_json)
print "object again\n ", as_obj

=>

object
  (5790, u'Vlv-Gate-Assy-Mdl-\xe1M1-2-\xe19/16-10K-BB Credit Memo            ', 60, True, '40141613')
json (type <type 'str'>)
  [5790, "Vlv-Gate-Assy-Mdl-\u00e1M1-2-\u00e19/16-10K-BB Credit Memo            ", 60, true, "40141613"]
object again
  [5790, u'Vlv-Gate-Assy-Mdl-\xe1M1-2-\xe19/16-10K-BB Credit Memo            ', 60, True, u'40141613']

这里还有一些reasoning about this

【讨论】:

    猜你喜欢
    • 2019-01-10
    • 1970-01-01
    • 2019-06-17
    • 1970-01-01
    • 2018-02-09
    • 2011-04-14
    • 2018-11-12
    • 1970-01-01
    相关资源
    最近更新 更多