Python 3.73 插入到 bytearray =“对象不能重新调整大小”答案

【问题标题】：Python 3.73 inserting into bytearray = "object cannot be re-sized"Python 3.73 插入到 bytearray =“对象不能重新调整大小”
【发布时间】：2020-09-29 05:53:35
【问题描述】：

我正在处理来自文件数据的字节数组。我是opening 它是'r+b'，所以可以更改为二进制。

在Python 3.7 docs 中，它解释了RegEx 的finditer() 可以使用m.start() 和m.end() 来标识匹配的开始和结束。

在问题Insert bytearray into bytearray Python 中，答案说可以通过使用切片对字节数组进行插入。但是尝试这样做时，会出现以下错误：BufferError: Existing exports of data: object cannot be re-sized。

这是一个例子：

    pat = re.compile(rb'0.?\d* [nN]')   # regex, binary "0[.*] n"
    with open(file, mode='r+b') as f:   # updateable, binary
        d = bytearray(f.read())         # read file data as d [as bytes]
        it = pat.finditer(d)            # find pattern in data as iterable
        for match in it:                # for each match,
            m = match.group()           # bytes of the match string to binary m
            ...
            val = b'0123456789 n'
            ...
            d[match.start():match.end()] = bytearray(val)

在文件中，匹配是0 n，我试图用0123456789 n 替换它，所以将插入9 个字节。可以用这段代码成功地改变文件，只是没有增加的大小。我究竟做错了什么？这是显示所有非增加文件大小操作的输出，但在插入数字时失败：

*** Changing b'0.0032 n' to b'0.0640 n'
len(d): 10435, match.start(): 607, match.end(): 615, len(bytearray(val)): 8
*** Found: "0.0126 n"; set to [0.252] or custom:
*** Changing b'0.0126 n' to b'0.2520 n'
len(d): 10435, match.start(): 758, match.end(): 766, len(bytearray(val)): 8
*** Found: "0 n"; set to [0.1] or custom:
*** Changing b'0 n' to b'0.1 n'
len(d): 10435, match.start(): 806, match.end(): 809, len(bytearray(val)): 5
Traceback (most recent call last):
  File "fixV1.py", line 190, in <module>
    main(sys.argv)
  File "fixV1.py", line 136, in main
    nchanges += search(midfile)     # perform search, returning count
  File "fixV1.py", line 71, in search
    d[match.start():match.end()] = bytearray(val)
BufferError: Existing exports of data: object cannot be re-sized

【问题讨论】：

len(d)、match.start()、match.end() 和 len(bytearray(val)) 的值是多少？
这部分正则表达式 0.?\d* 对您意味着什么？
RegEx 0.?\d* [nN] 表示“数据以 0 开头，有一个可选的 . 和 0 个或多个数字。然后是一个“”字符，以及一个 n或N。”它似乎在所有情况下都正确匹配。

标签： arrays python-3.x regex file write

【解决方案1】：

这是一个简单的例子，很像在迭代期间修改一个可迭代对象：

it = pat.finditer(d) creates a buffer 来自 bytearray 对象。这反过来又“锁定”了字节数组对象的大小。
d[match.start():match.end()] = bytearray(val) 尝试修改“锁定”字节数组对象的大小。

就像在迭代列表时尝试更改列表大小会失败一样，在迭代缓冲区时尝试更改字节数组大小也会失败。

您可以将该对象的副本提供给finditer()。

有关缓冲区以及 Python 如何在后台工作的更多信息，请参阅Python docs。

另外，请记住，您实际上并没有修改文件。您需要将数据写回文件，或使用memory mapped files。如果您正在寻找效率，我建议后者。

【讨论】：

这个想法是（起初）打开文件，读取它（进入 [d]），通过添加字节对其进行修改，然后将其写回。但这在切片插入时失败，而不是在文件保存时失败。在 MMAP 的文档中，它显示“# note that new content must have the same size”。我想问题变成了，可以将 [d] 创建为非锁定吗？
@rdtsc 长度可以更改。是的，它在 slice-insert 上失败了，因为它被锁定了。做到这一点的唯一方法是立即或懒惰地创建数据副本。尝试执行it = pat.finditer(d.copy())，您会发现它会起作用。