解决“UnicodeEncodeError: 'ascii' codec can't encode character u'\u2013' in position 16: ordinal not in range(128)”的方法不起作用答案

【问题标题】：method to solve "UnicodeEncodeError: 'ascii' codec can't encode character u'\u2013' in position 16: ordinal not in range(128)" is not working解决“UnicodeEncodeError: 'ascii' codec can't encode character u'\u2013' in position 16: ordinal not in range(128)”的方法不起作用
【发布时间】：2015-10-23 08:18:39
【问题描述】：

将字符串写入文件时出现错误。

UnicodeEncodeError: 'ascii' codec can't encode character u'\u2013' in position 16: ordinal not in range(128)

问题是我已经清理了字符串。不知道为什么这不起作用

这是我清理字符串的代码

import string

replace_punctuation = string.maketrans(string.punctuation, ' '*len(string.punctuation))

def clean_text(text):
    try:
        text = text.decode('utf-8', 'ignore')
    except:
        text = text.encode('utf-8', 'ignore')
        text = text.decode('utf-8', 'ignore')
    text = text.encode('ascii', 'ignore').lower().translate(replace_punctuation)
    text = " ".join(text.split())
    return text

text = "some text with special characters"
text = clean_text(text)
#outfile is an output file
outfile.write(text) #This step is giving error

知道我在清理字符串时缺少什么吗？

我知道这个问题被问了很多。但我的问题是这个问题最常见的解决方案是

text.encode('utf-8')

这对我不起作用。

我也试过了

text.encode('utf-8', 'ignore')

没用

【问题讨论】：

您希望将编码文本写入文件，而不是解码的 unicode 对象。当您使用 Ä 在 unicode 对象上调用 encode('ascii') 时，它将失败，因为它不在 ascii 中。另外：不用担心每个人都会在 python 中经历这些。精彩演讲：youtube.com/watch?v=sgHbC6udIqc
@SebastianWozny：谢谢。那么解决这个问题的方法是什么？用 ascii ignore 删除编码？ PS：现在看看那个视频。 :)

标签： python string python-2.7

【解决方案1】：

不确定它会为您抛出什么错误。我刚刚在 windows 上的 python 2.7.10 中测试了这个：

# -*- coding: utf8 -*-
import string

replace_punctuation = string.maketrans(string.punctuation, ' '*len(string.punctuation))

def clean_text(text):
    try:
        text = text.decode('utf-8', 'ignore')
    except:
        text = text.encode('utf-8', 'ignore')
        text = text.decode('utf-8', 'ignore')
    text = text.encode('ascii', 'ignore').lower().translate(replace_punctuation)
    text = " ".join(text.split())
    return text

text = "some text öäööäwith special characters"
text = clean_text(text)
print text
with open('test.txt','w') as outfile:
    outfile.write(text)

>>>
some text with special characters

该文件也包含相同的文本。

这可能是另一种选择：

from string import ascii_letters, whitespace, punctuation, digits
text = ''.join(c for c in text if c in chain(ascii_letters, whitespace, punctuation, digits)).lower().translate(replace_punctuation)

你过滤掉不在ascii中的字符，所以你不必担心unicode。

【讨论】：