在目录中解析 Python 2.7 vs. 3.2答案

【问题标题】：Parsing inside a directory problem Python 2.7 vs. 3.2在目录中解析 Python 2.7 vs. 3.2
【发布时间】：2011-08-29 05:59:46
【问题描述】：

我正在尝试在 Python 3 的目录中进行一些基本的文件解析。这段代码在 Python 2.7 中运行良好，但我无法弄清楚 Python 3.2 中的问题。

导入系统、操作系统、重新

filelist = os.listdir('/Users/sbrown/Desktop/Test') 
os.chdir('/Users/sbrown/Desktop/Test') 
for file in filelist:
    infile = open(file, mode='r') 
    filestring = infile.read() 
    infile.close() 
    pattern = re.compile('exit') 
    filestring = pattern.sub('so long', filestring) 
    outfile = open(file, mode='w') 
    outfile.write(filestring)
    outfile.close 
exit

这是被抛出的错误：

Traceback (most recent call last):
  File "/Users/bunsen/Desktop/parser.py", line 9, in <module>
      filestring = infile.read()
  File "/Library/Frameworks/Python.framework/Versions/3.2/lib/python3.2/encodings/ascii.py", line 26, in decode
      return codecs.ascii_decode(input, self.errors)[0]
  UnicodeDecodeError: 'ascii' codec can't decode byte 0x80 in position 3131: ordinal not in range(128)`

我正在解析的文件都是文本文件。我尝试在 utf-8 的方法参数中指定编码，但这不起作用。有任何想法吗？提前致谢！

如果我将编码指定为 utf-8，则会抛出以下错误：

Traceback (most recent call last):
  File "/Users/sbrown/Desktop/parser.py", line 9, in <module>
    filestring = infile.read()
  File "/Library/Frameworks/Python.framework/Versions/3.2/lib/python3.2/encodings/ascii.py", line 26, in decode
    return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0x80 in position 3131: ordinal not in range(128)`

【问题讨论】：

标签： encoding io python-3.x

【解决方案1】：

您在打开文件时没有指定编码。您需要在 Python 3 中执行此操作，因为在 Python 3 中，文本模式文件将返回解码后的 Unicode 字符串。

现在您尝试使用 UTF-8，但没有奏效，很明显，这不是使用的编码。只有你知道它是什么编码，但我猜测它是 cp1252，因为 0x80 是欧元的代码页字符，所以当你有欧洲 Windows 用户时，0x80 失败很常见。 :-)

为了兼容 Python 2.7 和 3.1，我建议你使用 io 库来打开文件。这是 Python 3 默认使用的那个，它在 Python 2.6 及更高版本中也可用：

import io
infile = io.open(filelist[0], mode='rt', encoding='cp1252')

【讨论】：

【解决方案2】：

这行得通吗？

import codecs
infile = codecs.open(filelist[0], encoding='UTF-8')
infile.read()

【讨论】：

【解决方案3】：

测试

filelist = os.listdir('/Users/sbrown/Desktop/Test') 
infile = open(filelist[0], mode='r') 
print(infile.encoding)

确保您阅读了utf-8 中的文件。如果没有，请检查您是否没有对codecs 做过坏事。您还可以使用强制utf-8 发布您的测试跟踪吗？

【讨论】：

感谢 Evpok 的帮助。默认编码是 US-ASCII。当我对我的问题强制使用 utf-8 编码时，我还添加了错误信息。诅咒你 Python 3！
哇，同样的痕迹，多么奇怪！ print(infile.encoding) 返回哪种编码？