【问题标题】：Print special characters in list in Python在 Python 中打印列表中的特殊字符
【发布时间】：2017-07-09 23:12:16
【问题描述】：

我有一个包含特殊字符（例如 é 或空格）的列表，当我打印列表时，这些字符会使用它们的 Unicode 代码打印，而如果我单独打印列表元素，它们会正确打印：

#!/usr/bin/env python
# -*- coding: utf-8 -*-

my_list = ['éléphant', 'Hello World']
print(my_list)
print(my_list[0])
print(my_list[1])

这段代码的输出是

['\xc3\xa9l\xc3\xa9phant', 'Hello World']

éléphant

Hello World

我希望['éléphant', 'Hello World'] 用于第一个输出。我应该改变什么？

【问题讨论】：

你可以做这样的事情来正确地编码你的print 语句：>>> print repr(my_list).decode("unicode-escape").encode('latin-1') 我把它作为答案发布但删除了它，因为我只在 python2 中测试过这个，所以我现在评论。
@ViníciusAguiar 我确实在使用 python2。如果您将 'latin-1' 替换为 'utf-8'，您的答案效果很好
哦，那很好！我不会取消删除它，因为看起来已经有几个很好的答案了。谢谢你让我知道！ =)

标签： python encoding character-encoding special-characters

【解决方案1】：

如果可能，切换到 Python 3，你会得到预期的结果。

如果你必须让它在 Python 2 中工作，那么使用unicode 字符串：

my_list = [u'éléphant', u'Hello World']

按照您现在的方式，Python 将第一个字符串解释为一系列字节，其值为 '\xc3\xa9l\xc3\xa9phant'，只有在正确 UTF-8 解码后才会转换为 Unicode 代码点：'\xc3\xa9l\xc3\xa9phant'.decode('utf8') == u'\xe9l\xe9phant'。

如果您希望打印列表 repr 并获取“unicode”，则必须手动将其编码为 UTF-8（如果您的终端能够理解）。

>>> print repr(my_list).decode('unicode-escape').encode('utf8')
[u'éléphant', u'Hello World']

但手动格式化更容易：

>>> print ", ".join(my_list)
éléphant, Hello World

【讨论】：

repr 到底是什么？
repr 返回一个对象的可打印表示，通常可以用eval 转回一个对象。当您调用print my_list 时，Pyhton2 实际上正在打印str(my_list)，这是针对等于repr(my_list) 的列表，而后者又组成了单个元素的reprs 的可打印列表。由于 unicode 字符串在 Python2 中并不是真正的原生字符串，所以我们使用 repr(my_list) 得到的是 "[u'\\xe9l\\xe9phant', u'Hello World']"，其中 unicode 代码点已转义。

【解决方案2】：

简短的回答，如果你想保持这种格式的输出，你必须自己实现它：

#!/usr/bin/env python
# -*- coding: utf-8 -*-

my_list = ['éléphant', 'Hello World']

def print_list (l):
    print ("[" + ", ".join(["'%s'" % str(x) for x in l]) + "]")

print_list (my_list)

这会产生预期的

['éléphant', 'Hello World']

但是，请注意，它会将所有元素放在引号内（例如偶数），因此如果您希望列表中包含字符串以外的任何内容，则可能需要更复杂的实现。

更长的答案

问题是 Python 在打印之前运行 str(my_list)。反过来，它会在列表的每个元素上运行 repr()。

现在，字符串上的repr() 返回字符串的纯 ASCII 表示。也就是说，您看到的那些 '\xc3' 是一个实际的反斜杠、一个实际的 'c' 和一个实际的 '3' 字符。

您无法解决这个问题，因为问题在于list.__str__ () 的实现。

下面是一个示例程序来证明这一点。

#!/usr/bin/env python
# -*- coding: utf-8 -*-

# vi: ai sts=4 sw=4 et

import pprint

my_list = ['éléphant', 'Hello World']

# under the hood, python first runs str(my_list), before printing it
my_list_as_string = str(my_list)

# str() on a list runs repr() on each of the elements.
# However, it seems that __repr__ on a string transforms it to an 
# ASCII-only representation
print ('str(my_list) = %s' % str(my_list))
for c in my_list_as_string:
    print c
print ('len(str(my_list)) = %s' % len(str(my_list)))
print ("\n")

# Which we can confirm here, where we can see that it it also adds the quotes:
print ('repr("é") == %s' % repr("é"))
for c in repr("é"):
    print c
print ('len(repr("é")) == %s' % len(repr("é")))
print ("\n")

# Even pprint fails
print ("pprint gives the same results")
pprint.pprint(my_list)

# It's useless to try to encode it, since all data is ASCII
print "Trying to encode"
print (my_list_as_string.encode ("utf8"))

生成这个：

str(my_list) = ['\xc3\xa9l\xc3\xa9phant', 'Hello World']
[
'
\
x
c
3
\
x
a
9
l
\
x
c
3
\
x
a
9
p
h
a
n
t
'
,

'
H
e
l
l
o

W
o
r
l
d
'
]
len(str(my_list)) = 41


repr("é") == '\xc3\xa9'
'
\
x
c
3
\
x
a
9
'
len(repr("é")) == 10


pprint gives the same results
['\xc3\xa9l\xc3\xa9phant', 'Hello World']
Trying to encode
['\xc3\xa9l\xc3\xa9phant', 'Hello World']

【讨论】：