如何将 u'\xc3\xa9cosyst\xc3\xa8mes' 解码为 utf-8 [重复]答案

【问题标题】：How to decode u'\xc3\xa9cosyst\xc3\xa8mes' to utf-8 [duplicate]如何将 u'\xc3\xa9cosyst\xc3\xa8mes' 解码为 utf-8 [重复]
【发布时间】：2019-03-24 11:26:59
【问题描述】：

通过使用 BeautifulSoup 进行网络抓取，我得到了一个查询字符串参数，该参数最终表示为：

param_value = u'\xc3\xa9cosyst\xc3\xa8mes'

阅读的时候，我猜应该是écosytèmes

我尝试了几种编码/转义/解码的方法（如here 和here 所述）

但我不断收到以下错误：

UnicodeEncodeError('ascii', u'\xc3\xa9cosyst\xc3\xa8mes', 0, 2, 'ordinal not in range(128)')

我也尝试了重复提出的解决方案：

Python 2.7.15 (default, Jul 23 2018, 21:27:06)
[GCC 4.2.1 Compatible Apple LLVM 9.1.0 (clang-902.0.39.2)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> s = u'\xc3\xa9cosyst\xc3\xa8mes'
>>> s.encode('latin-1').decode('utf-8')
u'\xe9cosyst\xe8mes'

但它让我回到了第 1 格...

我怎样才能从u'\xc3\xa9cosyst\xc3\xa8mes' 到u'écosystèmes'？

【问题讨论】：

相关：Fixing mojibakes in UTF-8 text。你所拥有的看起来像 UTF-8 解码为 latin-1。
u'\xe9cosyst\xe8mes' 是正确的 unicode 字符串值。您现在应该阅读Understanding repr( ) function in Python

标签： python python-2.7 encoding escaping

【解决方案1】：

您已将 UTF-8 解码为 latin-1，因此解决方案是先编码为 latin-1，然后再解码为 UTF-8。

>>> s = u'\xc3\xa9cosyst\xc3\xa8mes'
>>> s.encode('latin-1').decode('utf-8')
u'\xe9cosyst\xe8mes'
>>> print s.encode('latin-1').decode('utf-8')
écosystèmes

【讨论】：

它只会让我回到方块 1... ``` >>> s = u'\xc3\xa9cosyst\xc3\xa8mes' >>> s.encode('latin-1 ') '\xc3\xa9cosyst\xc3\xa8mes' >>> s.encode('latin-1').decode('utf-8') u'\xe9cosyst\xe8mes' ```
这不是一成不变的——这就是解决方案。 repr 不一定显示解码后的文本；但尝试print它（在可以处理Unicode的设备上）。

【解决方案2】：

我认为这会有所帮助：bytes(u'\xc3\xa9cosyst\xc3\xa8mes', 'latin-1').decode('utf-8')

【讨论】：

如果你能弄清楚如何将问题中的u'' 字符串转换成b'' 字节串，它会很简单；但这显然是这个问题的关键。
像这样：bytes(u'\xc3\xa9cosyst\xc3\xa8mes', 'latin-1').decode('utf-8') 现在应该可以工作了