【问题标题】:Getting the same Unicode string length in both Python 2 and 3?在 Python 2 和 3 中获得相同的 Unicode 字符串长度?
【发布时间】:2013-05-10 06:38:21
【问题描述】:

呃,Python 2 / 3 太令人沮丧了……想想这个例子,test.py

#!/usr/bin/env python
# -*- coding: utf-8 -*-

import sys
if sys.version_info[0] < 3:
  text_type = unicode
  binary_type = str
  def b(x):
    return x
  def u(x):
    return unicode(x, "utf-8")
else:
  text_type = str
  binary_type = bytes
  import codecs
  def b(x):
    return codecs.latin_1_encode(x)[0]
  def u(x):
    return x

tstr = " ▲ "

sys.stderr.write(tstr)
sys.stderr.write("\n")
sys.stderr.write(str(len(tstr)))
sys.stderr.write("\n")

运行它:

$ python2.7 test.py 
 ▲ 
5
$ python3.2 test.py 
 ▲ 
3

太好了,我得到了两个不同的字符串大小。希望将字符串包装在我在网上找到的这些包装器中会有所帮助?

对于tstr = text_type(" ▲ ")

$ python2.7 test.py 
Traceback (most recent call last):
  File "test.py", line 21, in <module>
    tstr = text_type(" ▲ ")
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 1: ordinal not in range(128)
$ python3.2 test.py 
 ▲ 
3

对于tstr = u(" ▲ ")

$ python2.7 test.py 
Traceback (most recent call last):
  File "test.py", line 21, in <module>
    tstr = u(" ▲ ")
  File "test.py", line 11, in u
    return unicode(x)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 1: ordinal not in range(128)
$ python3.2 test.py 
 ▲ 
3

对于tstr = b(" ▲ ")

$ python2.7 test.py 
 ▲ 
5
$ python3.2 test.py 
Traceback (most recent call last):
  File "test.py", line 21, in <module>
    tstr = b(" ▲ ")
  File "test.py", line 17, in b
    return codecs.latin_1_encode(x)[0]
UnicodeEncodeError: 'latin-1' codec can't encode character '\u25b2' in position 1: ordinal not in range(256)

对于tstr = binary_type(" ▲ ")

$ python2.7 test.py 
 ▲ 
5
$ python3.2 test.py 
Traceback (most recent call last):
  File "test.py", line 21, in <module>
    tstr = binary_type(" ▲ ")
TypeError: string argument without an encoding

嗯,这当然让事情变得容易。

那么,如何在 Python 2.7 和 3.2 中获得相同的字符串长度(在本例中为 3)?

【问题讨论】:

    标签: python python-2.7 python-3.x


    【解决方案1】:

    好吧,原来 Python 2.7 中的 unicode() 有一个 encoding 参数,这显然有帮助:

    #!/usr/bin/env python
    # -*- coding: utf-8 -*-
    
    import sys
    if sys.version_info[0] < 3:
      text_type = unicode
      binary_type = str
      def b(x):
        return x
      def u(x):
        return unicode(x, "utf-8")
    else:
      text_type = str
      binary_type = bytes
      import codecs
      def b(x):
        return codecs.latin_1_encode(x)[0]
      def u(x):
        return x
    
    tstr = u(" ▲ ")
    
    sys.stderr.write(tstr)
    sys.stderr.write("\n")
    sys.stderr.write(str(len(tstr)))
    sys.stderr.write("\n")
    

    运行这个,我得到了我需要的东西:

    $ python2.7 test.py 
     ▲ 
    3
    $ python3.2 test.py 
     ▲ 
    3
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 2014-01-17
      • 1970-01-01
      • 2015-05-20
      • 1970-01-01
      • 2021-07-01
      • 2017-11-19
      • 2015-09-02
      相关资源
      最近更新 更多