【发布时间】:2017-02-17 16:07:12
【问题描述】:
我在 python 中有以下函数,它接受一个字符串作为参数并以 ASCII 码返回相同的字符串(例如“alçapão”->“alcapao”):
def filt(word):
dic = { u'á':'a',u'ã':'a',u'â':'a' } # the whole dictionary is too big, it is just a sample
new = ''
for l in word:
new = new + dic.get(l, l)
return new
它应该“过滤”我从文件中读取的列表中的所有字符串:
lines = []
with open("to-filter.txt","r") as f:
for line in f:
lines.append(line.strip())
lines = [filt(l) for l in lines]
但我明白了:
filt.py:9: UnicodeWarning: Unicode equal comparison failed to convert
both arguments to Unicode - interpreting them as being unequal
new = new + dic.get(l, l)
过滤后的字符串包含 '\xc3\xb4' 之类的字符,而不是 ASCII 字符。我该怎么办?
【问题讨论】:
-
哪个版本的python?不同版本之间处理 UTF-8 的方式存在重大差异
-
2.7.12(Ubuntu的版本)
标签: python python-2.7 utf-8 character-encoding