str.encode 期望输入什么？答案

【问题标题】：What does str.encode expect as input?str.encode 期望输入什么？
【发布时间】：2015-02-28 12:21:01
【问题描述】：

我希望对项目中的所有字符串都使用unicode 而不是str。我正在尝试使用 str.encode 方法，但无法从文档中确定 encode 方法究竟做了什么或期望作为输入。

希腊小写字母pi为U+03C0，UTF-8编码为0xCF 0x80。我得到以下信息：

>>> s1 = '\xcf\x80'
>>> s1.encode('utf-8','ignore')

Traceback (most recent call last):
  File "<pyshell#61>", line 1, in <module>
    s1.encode('utf-8','ignore')
UnicodeDecodeError: 'ascii' codec can't decode byte 0xcf in position 0: ordinal not in range(128)

我试过了：

>>> s2='\x03\xc0'

>>> s2.encode('utf-8','ignore')

Traceback (most recent call last):
  File "<pyshell#62>", line 1, in <module>
    s2.encode('utf-8','ignore')
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc0 in position 1: ordinal not in range(128)

encode 期望输入什么，为什么“忽略”选项不忽略错误？我尝试了“替换”，但也没有掩盖错误。

【问题讨论】：

标签： python string python-2.7 encoding character-encoding

【解决方案1】：

在 Python 2.x 中，str 是一个字节字符串（编码）。您可以将其解码为 unicode 对象：

>>> s1 = '\xcf\x80'  # string literal (str)
>>> s1.decode('utf-8')
u'\u03c0'

对于 unicode 对象，您可以进行编码：

>>> u1 = u'\u03c0'  # unicode literal (unicode)  U+03C0
>>> u1.encode('utf-8')
'\xcf\x80'

【讨论】：