【发布时间】:2021-01-28 22:54:49
【问题描述】:
我工作的公司分发使用 python-docx 库的文档汇编软件。该软件在每个生成的文档上运行一个函数,该函数打开文档并对未正确转义的字符(即“& amp;”->“&”)进行简单的搜索和替换。
仅供参考,实际的文档程序集使用 python-docx-template。但是,该错误发生在文档已经组装之后,并且该错误是由搜索和替换功能触发的,该功能仅使用python-docx。
最近,我们遇到了一些无法在客户端部署中生成文档的案例。他们在实例化文档对象的这一行上抛出错误:
doc = Document(docx=Path(doc_path))
我们发现了两个错误:
raise BadZipFile("Bad magic number for file header")
和
raise EOFError
该软件被广泛使用,我们以前从未遇到过这个问题。我们无法在我们的测试环境中重现它。该错误在过去一周才开始出现,但在更新后已经出现在几个客户身上。该软件将多次无法生成特定文档,但在尝试几次后会成功。
我们只看到一个文档发生这种情况,但所有文档都使用相同的搜索和替换功能,就像我说的那样,错误只是问题文档的间歇性。
此搜索和替换功能的代码没有变化,我想不出任何其他有意义的差异来解释我们的文档组装过程。
我在查找有关 python-docx 库可能导致此问题的信息时遇到了很多麻烦。这是否表明生成的文档已损坏?如果有人能够阐明可能的原因,那将非常有帮助!
这是两个错误的堆栈跟踪:
错误的幻数...
File "/home/user/app/application/document_assembly/core_da.py", line 524, in translate_ampersands
doc = Document(docx=Path(doc_path))
File "/home/user/app-venv/lib/python3.6/site-packages/docx/api.py", line 25, in Document
document_part = Package.open(docx).main_document_part
File "/home/user/app-venv/lib/python3.6/site-packages/docx/opc/package.py", line 116, in open
pkg_reader = PackageReader.from_file(pkg_file)
File "/home/user/app-venv/lib/python3.6/site-packages/docx/opc/pkgreader.py", line 36, in from_file
phys_reader, pkg_srels, content_types
File "/home/user/app-venv/lib/python3.6/site-packages/docx/opc/pkgreader.py", line 69, in _load_serialized_parts
for partname, blob, reltype, srels in part_walker:
File "/home/user/app-venv/lib/python3.6/site-packages/docx/opc/pkgreader.py", line 104, in _walk_phys_parts
part_srels = PackageReader._srels_for(phys_reader, partname)
File "/home/user/app-venv/lib/python3.6/site-packages/docx/opc/pkgreader.py", line 83, in _srels_for
rels_xml = phys_reader.rels_xml_for(source_uri)
File "/home/user/app-venv/lib/python3.6/site-packages/docx/opc/phys_pkg.py", line 129, in rels_xml_for
rels_xml = self.blob_for(source_uri.rels_uri)
File "/home/user/app-venv/lib/python3.6/site-packages/docx/opc/phys_pkg.py", line 108, in blob_for
return self._zipf.read(pack_uri.membername)
File "/usr/lib/python3.6/zipfile.py", line 1337, in read
with self.open(name, "r", pwd) as fp:
File "/usr/lib/python3.6/zipfile.py", line 1396, in open
raise BadZipFile("Bad magic number for file header")
zipfile.BadZipFile: Bad magic number for file header
EOF错误
File "/home/user/app/application/document_assembly/core_da.py", line 524, in translate_ampersands
doc = Document(docx=Path(doc_path))
File "/home/user/app-venv/lib/python3.6/site-packages/docx/api.py", line 25, in Document
document_part = Package.open(docx).main_document_part
File "/home/user/app-venv/lib/python3.6/site-packages/docx/opc/package.py", line 116, in open
pkg_reader = PackageReader.from_file(pkg_file)
File "/home/user/app-venv/lib/python3.6/site-packages/docx/opc/pkgreader.py", line 36, in from_file
phys_reader, pkg_srels, content_types
File "/home/user/app-venv/lib/python3.6/site-packages/docx/opc/pkgreader.py", line 69, in _load_serialized_parts
for partname, blob, reltype, srels in part_walker:
File "/home/user/app-venv/lib/python3.6/site-packages/docx/opc/pkgreader.py", line 110, in _walk_phys_parts
for partname, blob, reltype, srels in next_walker:
File "/home/user/app-venv/lib/python3.6/site-packages/docx/opc/pkgreader.py", line 105, in _walk_phys_parts
blob = phys_reader.blob_for(partname)
File "/home/user/app-venv/lib/python3.6/site-packages/docx/opc/phys_pkg.py", line 108, in blob_for
return self._zipf.read(pack_uri.membername)
File "/usr/lib/python3.6/zipfile.py", line 1338, in read
return fp.read()
File "/usr/lib/python3.6/zipfile.py", line 858, in read
buf += self._read1(self.MAX_N)
File "/usr/lib/python3.6/zipfile.py", line 940, in _read1
data += self._read2(n - len(data))
File "/usr/lib/python3.6/zipfile.py", line 975, in _read2
raise EOFError
EOFError
【问题讨论】:
-
这种情况随机发生表明您的搜索和替换代码可能将某些二进制文件(例如时间戳)误认为是值得替换的字符
&,而实际上不应该。跨度> -
这发生在搜索和替换功能运行之前。当代码尝试在函数的最开始创建文档对象时抛出错误
-
其他随机性来源是磁盘文件系统损坏,不是在 i/o 中,而是在分配给 2 个不同文件的数据块中。您可以在卸载后尝试强制 fsck; fsck 通常只会检查一个干净的标志,实际上并不做任何验证。另一个想法可能是语言环境存在差异,并且不知何故,unicode BOM(字节顺序标记)被添加到文件开头。如果您在
zipfile.py(或本地副本)中将import pdb;pdb.set_trace()放在raise异常之前,您将进入调试器并可以调查它打开了什么文件以及读取了什么。
标签: python-3.x zipfile python-docx