如何从 PdfDocument 中提取字节 [] 数组答案

【问题标题】：How to extract the byte [] array from a PdfDocument如何从 PdfDocument 中提取字节 [] 数组
【发布时间】：2019-08-15 19:40:25
【问题描述】：

经过大量研究，我仍然找不到从 PdfDocument 对象中提取byte[] 的方法。我怎样才能做到这一点？

我尝试过使用 FileInputStream，但实际上我没有 PdfDocument 的“物理路径”，因为我正在以编程方式创建一个。此外，我对byte[] 不是很熟悉。

有人可以帮我解决这个问题吗？

    PdfDocument pdfDocumentWithoutSplit = getPdfUtils().generatePdfDocumentByMedia(shippingLabel);

        for (int i = 1; i < pdfDocumentWithoutSplit.getNumberOfPages() + 1; i++) {
            final ByteArrayOutputStream pdfByteArray = new ByteArrayOutputStream();
            final PdfDocument pdfDocument = new PdfDocument(new PdfWriter(pdfByteArray));

            pdfDocument.movePage(pdfDocumentWithoutSplit.getPage(i), i);
            pdfByteArray.close();
             //now here I need to get the bytes of each pdfDocument somehow

        }

干杯

【问题讨论】：

可以加代码吗？更明确地说“我正在以编程方式创建一个”，你的目标是什么？如果您正在创建 PDF，则意味着您将文本保存在某个变量中，很可能是 String，因此您可以从 String 中提取字节数组。如果你想从 PdfDocument 格式中提取字节数组，你可以创建一个 pdf temp
我实际上实现了将心理 pdf 的页面拆分为 PdfDocuments（1 页，1 个 PdfDocument），现在我需要获取这个 PdfDocuments 的字节，它们都没有心理路径。我在我的代码问题中添加了一个 sn-p

标签： java arrays pdf inputstream itext7

【解决方案1】：

        final ByteArrayOutputStream baos = new ByteArrayOutputStream();
        final PdfDocument pdfDocument = new PdfDocument(new PdfWriter(baos ));
        pdfDocument.movePage(pdfDocumentWithoutSplit.getPage(i), i);
        pdfDocument.close();
        // should close the PdfWriter, and hence the ByteArrayOutputStream
        baos .close();
        byte[] bytes = baos .toByteArray();

关闭事物将刷新内存中的任何缓冲数据，并填充 ByteArrayOutputStream。

【讨论】：

baos.toByteArray() 仅返回 15 字节的大文档（不可能），也抛出此 [PdfReader] 读取交叉引用表时发生错误。将重建交叉引用表。 com.itextpdf.io.IOException: PDF startxref not found.，有什么提示吗？
@Saliffanag PdfDocument.movePage 记录在将页面移动到新位置在同一文档中！您正在尝试使用它将页面从pdfDocumentWithoutSplit 移动到pdfDocument。这显然行不通。特别是可能会抛出一些异常。你会偶然发现并忽略异常吗？
@Joop Eggen 非常感谢你，它真的帮助了我。

【解决方案2】：

PDF 中的所有内容都应作为字符串处理。首先，您需要搜索物理路径（您可以使用正则表达式或类似的字符串处理来根据您生成路径的方式和使用的语言来搜索路径）。然后使用 PDF 阅读器（因为它不是纯文本文档）在 PDF 中搜索看起来像您的字节数组的字符串。最后，您需要通过提取其中的数据并使用拆分或数组生成方法将字符串转换为数组。祝你好运。

【讨论】：