在文件中就地（多个）替换答案

【问题标题】：Inplace (multiple) replacements in a file在文件中就地（多个）替换
【发布时间】：2018-04-03 15:28:32
【问题描述】：

我正在尝试在文件中执行一些替换：

'\t' --> '◊'
 '⁞' --> '\t'

This question 推荐以下程序：

import fileinput

with fileinput.FileInput(filename, inplace=True, backup='.bak') as file:
    for line in file:
        line = line.replace('\t','◊')
        print(line.replace('⁞','\t'), end='')

我不允许在那里发表评论，但是当我运行这段代码时，我收到一条错误消息：

UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 10: character maps to <undefined>

我之前通过添加encoding='utf-8' 纠正了这种错误。问题是fileinput.FileInput() 不允许编码参数。

问题：如何摆脱这个错误？

上面的解决方案，如果它可以工作并且速度与下面的方法相当，我会最高兴的。它似乎在做应该做的就地替换。

我也试过了：

replacements = {'\t':'◊', '⁞':'\t'}
with open(filename, encoding='utf-8') as inFile:
    contents = inFile.read()
with open(filename, mode='w', encoding='utf-8') as outFile:
    for i in replacements.keys():
        contents = contents.replace(i, replacements[i])
    outFile.write(contents)

相对较快，但在内存方面非常贪婪。

对于 UNIX 用户，我需要做以下事情的东西：

sed -i 's/\t/◊/g' 'file.csv'
sed -i 's/⁞/\t/g' 'file.csv'

事实证明这相当慢。

【问题讨论】：

标签： python

【解决方案1】：

通常，使用FileInput，您可以指定您希望将fileinput.hook_encoded 作为openhook 参数传递的编码：

import fileinput

with fileinput.FileInput(filename, openhook=fileinput.hook_encoded('utf-8')) as file:
    # ...

但是，这不适用于inplace=True。在这种情况下，您可以将文件视为二进制文件并自行解码/编码字符串。对于阅读，只需指定mode='rb' 即可完成，这将为您提供bytes 而不是str 行。对于编写它有点复杂，因为print 总是使用str，或者将给定的输入转换为str，所以传递字节不会按预期工作。但是，您可以直接write binary data to sys.stdout，这将起作用：

import sys
import fileinput

filename = '...'
with fileinput.FileInput(filename, mode='rb', inplace=True, backup='.bak') as file:
    for line in file:
        line = line.decode('utf-8')
        line = line.replace('\t', '◊')
        line = line.replace('⁞', '\t')
        sys.stdout.buffer.write(line.encode('utf-8'))

【讨论】：

而且它比我在帖子中的工作方法快约 2 倍（在我的情况下）。使用的 RAM 量可忽略不计。这太棒了@jdehesa！谢谢！