Unicode 字符比较不能正常工作答案

【问题标题】：Unicode character comparison doesn`t work as it shouldUnicode 字符比较不能正常工作
【发布时间】：2016-12-13 18:49:08
【问题描述】：

目前我正在开发一个程序，我有一行需要将字符 i 与 unicode 字符“””进行比较。如下所示：

    i != "”"

我的整个代码如下：

#!/usr/bin/env python
# -*- coding: utf-8 -*- 


f = open('text.txt', "r")
g = open('write.txt', "w")


for word in f:
  for i in word:
    if all( [i != " ", i != "," ,i != "!", i != "?", i != ";",  
       i !=".", i != ":", i != "”", i != "”" ]):
      g.write(i.lower())
    else:
        g.write('\n

这个想法是一个文本正在被解析，所有的字符，如点、点、问号等都被取出。唯一的问题是 unicode 字符“不会从文本中取出。你们能帮我解决这个问题吗？谢谢！

供您参考，我正在使用 python 2.7.11+

【问题讨论】：

text.txt 的内容是什么？？
被解析的只是纯文本。

标签： python unicode compare

【解决方案1】：

在表达式 i != "”" 中，i 和 "”" 都不是 Unicode 字符串。如果你想比较 Unicode 字符，并且你知道 test.txt 编码为 utf-8，试试这个：

for i in word.decode('utf-8'):
    if i != u"”":

与您的问题没有直接关系，使用in 可能比使用all() 更容易：

if i not in u" ,!?;.:”":

这是一个经过测试的示例程序：

#!/usr/bin/env python
# -*- coding: utf-8 -*- 


f = open('text.txt', "r")
g = open('write.txt', "w")


for word in f:
  for i in word.decode('utf-8'):
    if i not in u" ,!?;.:”":
      g.write(i.lower())
    else:
      g.write('\n')

输入text.txt:

hello.zippy”
goodbye

输出write.txt:

hello
zippy

goodbye

【讨论】：

我收到一个错误：UnicodeEncodeError: 'ascii' codec can't encode character u'\u2019' in position 0: ordinal not in range(128)

【解决方案2】：

Rob 的回答并不完整。我不得不把它放在文件的开头：

import sys
reload(sys)
sys.setdefaultencoding('utf-8')

现在一切都像魅力一样！ :D

【讨论】：