将字符串保存到文件时在 python 2.7 中处理 Unicode答案

【问题标题】：Handling Unicode in python 2.7 when saving string to a file将字符串保存到文件时在 python 2.7 中处理 Unicode
【发布时间】：2016-07-16 19:17:29
【问题描述】：

处理 Unicode 是我使用 Python 编程的唯一挑战，我在过去的项目中遇到了很多问题，而且我总是蛮力地测试不同的编码，直到某些东西起作用（如果有任何适合初学者的教程，它会非常方便） .

例如我有这个代码：

# -*- coding: utf-8 -*-
string = "Åland Islands"
with open("1.txt","w")as f:
    f.write(string.decode("utf-8"))

   return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xc5 in position 0: invalid continuation byte

我测试了许多编码来解决这个问题。

【问题讨论】：

string = u"Åland Islands"，然后是 f.write(s)，应该可以工作
不起作用`文件“C:\Python27\learn\unicode\test.py”，第 2 行语法错误：文件 C:\Python27\learn\ 中的非 ASCII 字符 '\xc5' unicode\test.py 在第 2 行，但没有声明编码；详情见python.org/dev/peps/pep-0263`

【解决方案1】：

代码行只是告诉 Python 解释器 it 应该如何解释字节。这并不意味着脚本实际上包含 UTF-8 编码的文本。事实上，错误消息表明该文件已保存为 ISO-8859 编码 (Latin-1) 文本。 0xc5 是 Å 的 Latin-1 编码； 0xc3 0x85 为 UTF-8 编码。

您需要确保您的编辑器确实将文件保存为 UTF-8 编码文本，这样编码行才不会欺骗解释器。

【讨论】：