【问题标题】:In Python how to encode/decode unicode characters such as ö在 Python 中,如何编码/解码 ö 等 unicode 字符
【发布时间】:2014-03-22 02:04:07
【问题描述】:

在 CentOS 6.4 上使用 Python 2.6.6

import json
import urllib2    

url = 'http://www.google.com.hk/complete/search?output=toolbar&hl=en&q=how%20to%20pronounce%20e'
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor())
opener.addheaders = [('Accept-Charset', 'utf-8')]
response = opener.open(url)
page = response.read()
print page

结果:

...<suggestion data="how to pronounce eyjafjallaj

Python 死机时没有错误消息。

我认为它死了是因为下一个字符是ö

<toplevel>
<CompleteSuggestion>
<suggestion data="how to pronounce edinburgh"/>
</CompleteSuggestion>
<CompleteSuggestion>
<suggestion data="how to pronounce elle"/>
</CompleteSuggestion>
<CompleteSuggestion>
<suggestion data="how to pronounce edith"/>
</CompleteSuggestion>
<CompleteSuggestion>
<suggestion data="how to pronounce et al"/>
</CompleteSuggestion>
<CompleteSuggestion>
<suggestion data="how to pronounce eunice"/>
</CompleteSuggestion>
<CompleteSuggestion>
<suggestion data="how to pronounce english names"/>
</CompleteSuggestion>
<CompleteSuggestion>
<suggestion data="how to pronounce edamame"/>
</CompleteSuggestion>
<CompleteSuggestion>
<suggestion data="how to pronounce erudite"/>
</CompleteSuggestion>
<CompleteSuggestion>
<suggestion data="how to pronounce eyjafjallajökull"/>
</CompleteSuggestion>
<CompleteSuggestion>
<suggestion data="how to pronounce either"/>
</CompleteSuggestion>
</toplevel>

http://www.google.com.hk/complete/search?output=toolbar&hl=en&q=how%20to%20pronounce%20e

这似乎是一个 unicode 问题,我尝试了许多方式 encode('utf-8') 和 decode('utf-8') ,但它仍然死了。有什么想法吗?

PS 看来我需要使用 urllib2 而不是 urllib,因为 urllib 会忽略导致其他问题的 cookie。

【问题讨论】:

标签: python urllib2 python-unicode utf8-decode


【解决方案1】:

response.read() 返回一个字节串。 Python 不应该在打印字节串时死掉,因为没有发生字符转换,字节按原样打印。

您可以尝试打印 Unicode:

text = page.decode(response.info().getparam('charset') or 'utf-8')
print text

【讨论】:

    猜你喜欢
    • 2019-10-02
    • 2011-04-21
    • 2011-07-14
    • 2018-09-21
    • 2019-07-26
    • 1970-01-01
    • 2019-12-19
    • 2012-04-10
    • 1970-01-01
    相关资源
    最近更新 更多