Python从文件编码问题中读取答案

【问题标题】：Python reading from file encoding problemPython从文件编码问题中读取
【发布时间】：2019-03-19 13:38:55
【问题描述】：

当我这样读时，一些文件

list_of_files = glob.glob('./*.txt') # create the list of files
for file_name in list_of_files:
    FI = open(file_name, 'r', encoding='cp1252')

错误：

UnicodeDecodeError：“charmap”编解码器无法解码位置 1260 中的字节 0x9d：字符映射到

当我切换到这个

list_of_files = glob.glob('./*.txt') # create the list of files
for file_name in list_of_files:
    FI = open(file_name, 'r', encoding="utf-8")

错误：

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x92 in position 1459: invalid start byte

我已经读到我应该将它作为二进制文件打开。但我不知道该怎么做。这是我的功能：

def readingAndAddToList():
    list_of_files = glob.glob('./*.txt') # create the list of files
    for file_name in list_of_files:
        FI = open(file_name, 'r', encoding="utf-8")
        stext = textProcessing(FI.read())# split returns a list of words delimited by sequences of whitespace (including tabs, newlines, etc, like re's \s)
        secondaryWord_list = stext.split()
        word_list.extend(secondaryWord_list) # Add words to main list
        print("Lungimea fisierului ",FI.name," este de", len(secondaryWord_list), "caractere")
        sortingAndNumberOfApparitions(secondaryWord_list)
        FI.close()

我的函数的开始很重要，因为我在阅读部分得到了错误

【问题讨论】：

你可以试试open(file_name, 'r', errors = 'ignore')吗？它会给你所需的输出吗？
你能分享有问题的文件吗？或者找出导致异常的有问题的符号？
@Hello.World 它以这种方式工作。因为我根本不需要像“'”这样的字符
@Adrian 是的，我确实认为这是问题所在。我没有你的文件样本，所以告诉你试试！无论如何很乐意提供帮助！ :)

标签： python encoding utf-8 character-encoding cp1252

【解决方案1】：

如果您使用的是 Windows，请在记事本中打开文件并保存为所需的编码。在 Linux 中，在文本编辑器中执行相同操作。希望你的程序运行。

【讨论】：