不幸的是,OP 没有提供示例文档来重现该问题。因此,我不得不猜测。
我假设问题是基于没有立即链接到页面对象但从其父对象继承的对象。
在这种情况下,使用PDDocument.addPage 是错误的选择,因为此方法仅将给定的页面对象添加到目标文档页面树中,而不考虑继承的内容。
应该使用PDDocument.importPage,它被记录为:
/**
* This will import and copy the contents from another location. Currently the content stream is stored in a scratch
* file. The scratch file is associated with the document. If you are adding a page to this document from another
* document and want to copy the contents to this document's scratch file then use this method otherwise just use
* the {@link #addPage} method.
*
* Unlike {@link #addPage}, this method does a deep copy. If your page has annotations, and if
* these link to pages not in the target document, then the target document might become huge.
* What you need to do is to delete page references of such annotations. See
* <a href="http://stackoverflow.com/a/35477351/535646">here</a> for how to do this.
*
* @param page The page to import.
* @return The page that was imported.
*
* @throws IOException If there is an error copying the page.
*/
public PDPage importPage(PDPage page) throws IOException
实际上,即使这种方法也可能不够用,因为它没有考虑所有继承的属性,但是查看Splitter 实用程序类,人们会觉得必须做什么:
PDPage imported = getDestinationDocument().importPage(page);
imported.setCropBox(page.getCropBox());
imported.setMediaBox(page.getMediaBox());
// only the resources of the page will be copied
imported.setResources(page.getResources());
imported.setRotation(page.getRotation());
// remove page links to avoid copying not needed resources
processAnnotations(imported);
使用辅助方法
private void processAnnotations(PDPage imported) throws IOException
{
List<PDAnnotation> annotations = imported.getAnnotations();
for (PDAnnotation annotation : annotations)
{
if (annotation instanceof PDAnnotationLink)
{
PDAnnotationLink link = (PDAnnotationLink)annotation;
PDDestination destination = link.getDestination();
if (destination == null && link.getAction() != null)
{
PDAction action = link.getAction();
if (action instanceof PDActionGoTo)
{
destination = ((PDActionGoTo)action).getDestination();
}
}
if (destination instanceof PDPageDestination)
{
// TODO preserve links to pages within the splitted result
((PDPageDestination) destination).setPage(null);
}
}
// TODO preserve links to pages within the splitted result
annotation.setPage(null);
}
}
当您尝试将单个 PDF 拆分为多个,例如将 10 页文档拆分为 10 个单页文档时,您可能希望按原样使用此 Splitter 实用程序类。
测试
为了测试这些方法,我使用了 PDF Clown 示例输出 AnnotationSample.Standard.pdf 的输出,因为该库严重依赖于页面树值的继承。因此,我使用PDDocument.addPage、PDDocument.importPage 或Splitter 将其唯一页面的内容复制到一个新文档中,如下所示:
PDDocument source = PDDocument.load(resource);
PDDocument output = new PDDocument();
PDPage page = source.getPages().get(0);
output.addPage(page);
output.save(new File(RESULT_FOLDER, "PageAddedFromAnnotationSample.Standard.pdf"));
output.close();
(CopyPages.java 测试testWithAddPage)
PDDocument source = PDDocument.load(resource);
PDDocument output = new PDDocument();
PDPage page = source.getPages().get(0);
output.importPage(page);
output.save(new File(RESULT_FOLDER, "PageImportedFromAnnotationSample.Standard.pdf"));
output.close();
(CopyPages.java 测试testWithImportPage)
PDDocument source = PDDocument.load(resource);
Splitter splitter = new Splitter();
List<PDDocument> results = splitter.split(source);
Assert.assertEquals("Expected exactly one result document from splitting a single page document.", 1, results.size());
PDDocument output = results.get(0);
output.save(new File(RESULT_FOLDER, "PageSplitFromAnnotationSample.Standard.pdf"));
output.close();
(CopyPages.java 测试testWithSplitter)
只有最终测试忠实地复制了页面。