删除文件中的最后一个字符答案

【问题标题】：Remove very last character in file删除文件中的最后一个字符
【发布时间】：2013-09-22 08:01:43
【问题描述】：

查遍了整个互联网，终于找到了这个。

假设我已经制作了一个文本文件，内容如下： Hello World

好吧，我想从这个文本文件中删除最后一个字符（在本例中为 d）。

所以现在文本文件应该是这样的：Hello Worl

但我不知道该怎么做。

我想要的，或多或少，是我硬盘上的文本文件的单个退格功能。

这需要在 Linux 上运行，因为我正在使用它。

【问题讨论】：

标签： python file text

【解决方案1】：

使用fileobject.seek()从末尾开始寻找1个位置，然后使用file.truncate()删除文件的其余部分：

import os

with open(filename, 'rb+') as filehandle:
    filehandle.seek(-1, os.SEEK_END)
    filehandle.truncate()

这适用于单字节编码。如果您有一个多字节编码（例如 UTF-16 或 UTF-32），您需要从末尾寻找足够的字节来解释单个代码点。

对于可变字节编码，是否可以使用此技术取决于编解码器。对于 UTF-8，您需要找到 bytevalue & 0xC0 != 0x80 为真的第一个字节（从末尾开始），并从该点开始截断。这可确保您不会在多字节 UTF-8 代码点的中间截断：

with open(filename, 'rb+') as filehandle:
    # move to end, then scan forward until a non-continuation byte is found
    filehandle.seek(-1, os.SEEK_END)
    while filehandle.read(1) & 0xC0 == 0x80:
        # we just read 1 byte, which moved the file position forward,
        # skip back 2 bytes to move to the byte before the current.
        filehandle.seek(-2, os.SEEK_CUR)

    # last read byte is our truncation point, move back to it.
    filehandle.seek(-1, os.SEEK_CUR)
    filehandle.truncate()

请注意，UTF-8 是 ASCII 的超集，因此上述方法也适用于 ASCII 编码的文件。

【讨论】：

根据 [1] “SEEK_END 或 2：寻找到流的末尾；偏移量必须为零（不支持所有其他值）。” 1：docs.python.org/3/library/…
@zvyn：您正在查看错误的文档。请参阅io.IOBase.seek()。该文件以二进制模式而不是文本模式打开。在文本模式下，偏移量取决于可以使用可变长度字节的文本编码；这就是TextIOBase.seek() 方法不支持向后搜索的原因。但在二进制模式下，我们改为按字节查找，从末尾开始的负偏移是完全合法的。
对于大文件（即 > 10GB），这似乎需要很长时间。必须有一些文件读取或复制正在进行。 truncate 命令对我来说效果更好，但也许我做错了什么。
@shrewmouse：我已经在非常非常大的文件上使用了它，没有问题。如果没有有关操作系统或文件系统的详细信息，我无法帮助您调试遇到问题的原因。

【解决方案2】：

Martijn 接受的答案很简单，也很有效，但不包含以下文本文件：

UTF-8 编码包含非英文字符（这是 Python 3 中文本文件的默认编码）
一个文件末尾的换行符（这是vim或gedit等Linux编辑器的默认设置）

如果文本文件包含非英文字符，那么到目前为止提供的答案都不起作用。

以下是一个示例，它解决了这两个问题，它还允许从文件末尾删除多个字符：

import os


def truncate_utf8_chars(filename, count, ignore_newlines=True):
    """
    Truncates last `count` characters of a text file encoded in UTF-8.
    :param filename: The path to the text file to read
    :param count: Number of UTF-8 characters to remove from the end of the file
    :param ignore_newlines: Set to true, if the newline character at the end of the file should be ignored
    """
    with open(filename, 'rb+') as f:
        last_char = None

        size = os.fstat(f.fileno()).st_size

        offset = 1
        chars = 0
        while offset <= size:
            f.seek(-offset, os.SEEK_END)
            b = ord(f.read(1))

            if ignore_newlines:
                if b == 0x0D or b == 0x0A:
                    offset += 1
                    continue

            if b & 0b10000000 == 0 or b & 0b11000000 == 0b11000000:
                # This is the first byte of a UTF8 character
                chars += 1
                if chars == count:
                    # When `count` number of characters have been found, move current position back
                    # with one byte (to include the byte just checked) and truncate the file
                    f.seek(-1, os.SEEK_CUR)
                    f.truncate()
                    return
            offset += 1

它是如何工作的：

以二进制模式仅读取 UTF-8 编码文本文件的最后几个字节
向后迭代字节，寻找 UTF-8 字符的开头
找到字符（不同于换行符）后，将其作为文本文件中的最后一个字符返回

示例文本文件 - bg.txt:

Здравей свят

使用方法：

filename = 'bg.txt'
print('Before truncate:', open(filename).read())
truncate_utf8_chars(filename, 1)
print('After truncate:', open(filename).read())

输出：

Before truncate: Здравей свят
After truncate: Здравей свя

这适用于 UTF-8 和 ASCII 编码文件。

【讨论】：

【解决方案3】：

如果你不是在二进制模式下读取文件，你只有“w”权限，我可以建议以下。

f.seek(f.tell() - 1, os.SEEK_SET)
f.write('')

在上面的这段代码中，f.seek() 将只接受 f.tell() b/c 您没有“b”访问权限。然后您可以将光标设置到最后一个元素的开头。然后可以通过空字符串删除最后一个元素。

【讨论】：

或者更清洁到f.truncate()，而不是最后的f.write('')。

【解决方案4】：

with open(urfile, 'rb+') as f:
    f.seek(0,2)                 # end of file
    size=f.tell()               # the size...
    f.truncate(size-1)          # truncate at that size - how ever many characters

请务必在 Windows 上使用二进制模式，因为 Unix 文件行结尾很多返回 illegal or incorrect 字符数。

【讨论】：

【解决方案5】：

with open('file.txt', 'w') as f:
    f.seek(0, 2)              # seek to end of file; f.seek(0, os.SEEK_END) is legal
    f.seek(f.tell() - 2, 0)  # seek to the second last char of file; f.seek(f.tell()-2, os.SEEK_SET) is legal
    f.truncate()

取决于文件的最后一个字符，可以是换行符 (\n) 或其他任何字符。

【讨论】：

是的，但您还没有阅读完整的答案。查看标有解决方案的部分，最后一个代码sn-p。该代码所做的第一件事是什么？
啊！！知道了，必须是f.seek(f.tell() - 2, 0)
而且，更重要的是，首先要坚持到底。span>

【解决方案6】：

这是一种肮脏的方式（擦除并重新创建）... 我不建议使用它，但是，可以这样做..

x = open("file").read()
os.remove("file")
open("file").write(x[:-1])

【讨论】：

不推荐手动opening文件，with open语法更好。

【解决方案7】：

在 Linux 系统或（Windows 下的 Cygwin）上。您可以使用标准的truncate 命令。您可以使用此命令减小或增加文件的大小。

为了将文件减少 1G，命令将是 truncate -s 1G filename。在以下示例中，我将名为 update.iso 的文件减少了 1G。

请注意，此操作用时不到 5 秒。

chris@SR-ENG-P18 /cygdrive/c/Projects
$ stat update.iso
  File: update.iso
  Size: 30802968576     Blocks: 30081024   IO Block: 65536  regular file
Device: ee6ddbceh/4000177102d   Inode: 19421773395035112  Links: 1
Access: (0664/-rw-rw-r--)  Uid: (1052727/   chris)   Gid: (1049089/Domain Users)
Access: 2020-06-12 07:39:00.572940600 -0400
Modify: 2020-06-12 07:39:00.572940600 -0400
Change: 2020-06-12 07:39:00.572940600 -0400
 Birth: 2020-06-11 13:31:21.170568000 -0400

chris@SR-ENG-P18 /cygdrive/c/Projects
$ truncate -s -1G update.iso

chris@SR-ENG-P18 /cygdrive/c/Projects
$ stat update.iso
  File: update.iso
  Size: 29729226752     Blocks: 29032448   IO Block: 65536  regular file
Device: ee6ddbceh/4000177102d   Inode: 19421773395035112  Links: 1
Access: (0664/-rw-rw-r--)  Uid: (1052727/   chris)   Gid: (1049089/Domain Users)
Access: 2020-06-12 07:42:38.335782800 -0400
Modify: 2020-06-12 07:42:38.335782800 -0400
Change: 2020-06-12 07:42:38.335782800 -0400
 Birth: 2020-06-11 13:31:21.170568000 -0400

stat 命令会告诉您有关文件的大量信息，包括文件大小。

【讨论】：

【解决方案8】：

这可能不是最佳的，但如果上述方法不起作用，您可以这样做：

with open('myfile.txt', 'r') as file:
    data = file.read()[:-1]
with open('myfile.txt', 'w') as file:
    file.write(data)

代码首先打开文件，然后将其内容（最后一个字符除外）复制到字符串data。之后，文件被截断为零长度（即清空），data 的内容以相同的名称保存到文件中。这与 vins ms 的答案基本相同，只是它不使用 os 包，而是使用了更安全的 'with open' 语法。如果文本文件很大，可能不建议这样做。（我之所以写这个，是因为上述方法在 python 3.8 中对我来说都不是很好）。

【讨论】：