如何解决从 requests.post 到 validator.w3.org 调用 response.json 的 UnicodeEncodeError？答案

【问题标题】：How can I resolve UnicodeEncodeError calling response.json from a requests.post to validator.w3.org?如何解决从 requests.post 到 validator.w3.org 调用 response.json 的 UnicodeEncodeError？
【发布时间】：2016-03-11 03:23:26
【问题描述】：

我希望能够在我的 Python 代码中验证从模板渲染函数生成的一些 html。

我去Github page for validator.w3.org咨询API。

根据我对所读内容的解释，我尝试了以下代码：

import requests
import urllib    

index_html = '<!DOCTYPE html>\n<html lang="en">\n<head>\n  '\
    '<meta charset="UTF-8">\n  '\
    '<title></title>\n</head>\n<body>\n  \n</body>\n</html>\n'
FRAGMENT = ''
query = {}
QUERY = 3
tokens = ['https', 'validator.w3.org', 'nu/', query, FRAGMENT]
headers = {'Content-type': 'text/html; charset=utf-8'}
query = {'out': 'json'}
query = urllib.parse.urlencode(query)
tokens[QUERY] = query
url = urllib.parse.urlunsplit(tokens)
kwargs = dict(
    headers=headers,
    data=index_html,
)
response = requests.post(url, **kwargs)

response.json() 返回：

*** UnicodeEncodeError: 'ascii' codec can't encode character '\u201c' in position 48: ordinal not in range(128)

response.content是这个：

b'{"messages":[{"type":"info","message":"Content-Type 是 \xe2\x80\x9ctext/html\xe2\x80\x9d。使用 HTML 解析器。 "},{"type":"info","message":"使用支持 SVG 1.1、MathML 3.0、RDFa 1.1 和 ITS 2.0 的 HTML 架构。"},{"type":"error"," lastLine":5,"lastColumn":17,"firstColumn":10,"message":"元素 \xe2\x80\x9ctitle\xe2\x80\x9d 不能为空。","extract":"\n \ n

type(response.content) 是<class 'bytes'>。我知道json.loads 需要一个字符串，所以我假设response.json 抛出异常，因为内容是无法解码为字符串的字节：

import json
json.loads(response.content.decode('utf-8'))

确实，同样的例外：

*** UnicodeEncodeError: 'ascii' codec can't encode character '\u201c' in position 48: ordinal not in range(128)

我的知识已经用完了，这让我一直想知道要更改这段代码的哪一部分才能从 requests.post response 获取 JSON。

提前感谢您的帮助。

【问题讨论】：

在使用 Python 2.x 而不是 Python 3.x 时，这似乎是一个问题；其他人可以确认吗？
噢！你的评论让我想起我一直在我的 virtualenv 中摆弄 Python 版本。我刚刚在 python3.4 中尝试了我的代码并达到了预期的结果。谢谢。 :-)

标签： python python-requests python-3.4

【解决方案1】：

答案是检查确实一种是使用̶P̶y̶t̶h̶o̶n̶3̶.̶x̶而不是̶P̶y̶t̶h̶o̶n̶2̶.̶x̶当一个期望使用̶P̶y̶t̶h̶o̶n̶3̶.̶x̶！̶ P>

请参阅下面的更新。

谢谢。

{'messages': [{'message': 'The Content-Type was “text/html”. Using the HTML parser.', 'type': 'info'}, {'message': 'Using the schema for HTML with SVG 1.1, MathML 3.0, RDFa 1.1, and ITS 2.0 support.', 'type': 'info'}, {'extract': '\n <title></title>\n</hea', 'firstColumn': 10, 'hiliteLength': 8, 'hiliteStart': 10, 'lastColumn': 17, 'lastLine': 5, 'message': 'Element “title” must not be empty.', 'type': 'error'}]}

更新：

这个故事还有更多内容。事实上，我使用的是 Python3。我只是省略了关于使用 py.test 和 --pdb 选项的部分。

我怎么知道我使用的是 Python3？

来自python3 test_mytest.py 的输出，其中 test_mytest.py 内部是：

if __name__ == '__main__':
    import sys
    sys.exit(pytest.main('-s --pdb'))

这是：

平台 linux -- Python 3.4.3、pytest-2.8.3、py-1.4.31、pluggy-0.3.1

进入 pdb 后，我仍然遇到编码错误。我在the answer by @daveagp in this post.找到了解决方案

他写了a page 来描述他对这个问题的考验。谢谢@daveagp。

执行export PYTHONIOENCODING='utf_8' 后，我不再有任何编码错误。

我错了！

【讨论】：