【问题标题】:python intersection of utf-8 list and utf-8 stringutf-8列表和utf-8字符串的python交集
【发布时间】:2015-06-02 12:58:01
【问题描述】:

当我使用带有ASCII 字母和ASCII 字符串的列表时,我使此代码工作,但我无法使这个工作。

# -*- coding: utf-8 -*-
asa = ["ā","ē","ī","ō","ū","ǖ","Ā","Ē","Ī","Ō","Ū","Ǖ",
"á","é","í","ó","ú","ǘ","Á","É","Í","Ó","Ú","Ǘ",
"ǎ","ě","ǐ","ǒ","ǔ","ǚ","Ǎ","Ě","Ǐ","Ǒ","Ǔ","Ǚ",
"à","è","ì","ò","ù","ǜ","À","È","Ì","Ò","Ù","Ǜ"]
[x.decode('utf-8') for x in asa]
print list(set(asa) & set("ō"))

【问题讨论】:

    标签: python utf-8 ascii intersection


    【解决方案1】:

    您需要将您的字符放在一个列表中,因为字符串是可迭代的对象,而您的 unicode 字符包含 2 字节字符串,因此 python 假定“ō”为 \xc5\x8d。 :

    >>> list("ō")
    ['\xc5', '\x8d']
    >>> print list(set(asa) & set(["ō"]))
    ['\xc5\x8d']
    >>> print list(set(asa) & set(["ō"]))[0]
    ō
    

    【讨论】:

      【解决方案2】:

      您的第一个集合包含"ō".decode('utf-8')(类型unicode)形式的元素,相当于u"ō"

      第二组包含像"ō"(类型str)这样的字节字符串,所以它们比较不相等,你也没有交集。

      冥想:

      >>> 'a' == u'a'
      True
      >>> 'ō' == u'ō'
      __main__:1: UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal
      False
      >>> list('ō')
      ['\xc5', '\x8d']
      >>> list(u'ō')
      [u'\u014d']
      

      【讨论】:

        猜你喜欢
        • 1970-01-01
        • 2012-01-14
        • 2011-02-26
        • 1970-01-01
        • 2022-11-12
        • 2018-01-26
        • 1970-01-01
        • 2023-03-27
        • 2013-05-26
        相关资源
        最近更新 更多