ItextSharp 的 PDF 合并答案

【问题标题】：PDF Merging by ItextSharpItextSharp 的 PDF 合并
【发布时间】：2013-03-09 10:54:25
【问题描述】：

我如何使用iTextSharp 将多个 pdf 页面合并为一个，它还支持合并具有 form 元素的页面，例如 textboxes、checkboxes 等。

我用谷歌搜索了很多，但没有一个效果很好。

【问题讨论】：

您的意思是将多个pdf页面合并为一个 页面还是一个 PDF？源页面是否在同一个 PDF 中？
我想在一个 pdf 中做
所以应该将一个或多个源文档的页面（包括表单元素）作为单独的页面复制到一个目标 PDF 中？在这种情况下，@Jonathan 的answer 他在这里的回答中提到的看起来像你需要的。
是的，我也尝试过 Jonathan 的代码，它使用 PdfSmartCopy 类复制所有内容。但在某些情况下不起作用
stackoverflow.com/questions/6029142/…

标签： c# itextsharp

【解决方案1】：

在这里查看我的答案Merging Memory Streams。我举了一个例子，说明如何将 PDF 与 itextsharp 合并。

要更新表单字段名称，请添加此代码，该代码使用压模更改表单字段名称。

/// <summary>
/// Merges pdf files from a byte list
/// </summary>
/// <param name="files">list of files to merge</param>
/// <returns>memory stream containing combined pdf</returns>
public MemoryStream MergePdfForms(List<byte[]> files)
{
    if (files.Count > 1)
    {
        string[] names;
        PdfStamper stamper;
        MemoryStream msTemp = null;
        PdfReader pdfTemplate = null;
        PdfReader pdfFile;
        Document doc;
        PdfWriter pCopy;
        MemoryStream msOutput = new MemoryStream();

        pdfFile = new PdfReader(files[0]);

        doc = new Document();
        pCopy = new PdfSmartCopy(doc, msOutput);
        pCopy.PdfVersion = PdfWriter.VERSION_1_7;

        doc.Open();

        for (int k = 0; k < files.Count; k++)
        {
            for (int i = 1; i < pdfFile.NumberOfPages + 1; i++)
            {
                msTemp = new MemoryStream();
                pdfTemplate = new PdfReader(files[k]);

                stamper = new PdfStamper(pdfTemplate, msTemp);

                names = new string[stamper.AcroFields.Fields.Keys.Count];
                stamper.AcroFields.Fields.Keys.CopyTo(names, 0);
                foreach (string name in names)
                {
                    stamper.AcroFields.RenameField(name, name + "_file" + k.ToString());
                }

                stamper.Close();
                pdfFile = new PdfReader(msTemp.ToArray());
                ((PdfSmartCopy)pCopy).AddPage(pCopy.GetImportedPage(pdfFile, i));
                pCopy.FreeReader(pdfFile);
            }
        }

        pdfFile.Close();
        pCopy.Close();
        doc.Close();

        return msOutput;
    }
    else if (files.Count == 1)
    {
        return new MemoryStream(files[0]);
    }

    return null;
}

【讨论】：

如果您使用带有表单字段的单个 pdf 页面（只是尝试使用一个可能有很多可用页面）并且最终处理的 pdf 将获得出色的结果，但如果您将相同的合并 3 到 4 倍以上，第一页包含所有表单字段值，但其余部分为空白 :(...
@mns 好的，您所拥有的是表单字段具有相同的名称，当您合并同一页面的多个时，它不希望它们都相同。您可以做的是将它们的名称更新为第 1 页的名称 + _p1 等等。查看更新的答案。
@mns 我把它和我有的其他代码 sn-ps 放在一起，还没有测试过，希望它对你有用。
@Jonathan..u r 绝对正确。感谢 Jonathan 在整个社区中给出最好的答案..:)
如果某些 PDF 的字段问题仍然存在，您可能想尝试使用 PdfCopyFields 类而不是 PdfSmartCopy 或 PdfCopy...

【解决方案2】：

这是我的 Jonathan 合并代码的简化版本，添加了命名空间并删除了标记。

public IO.MemoryStream MergePdfForms(System.Collections.Generic.List<byte[]> files)
{
    if (files.Count > 1) {
        using (System.IO.MemoryStream msOutput = new System.IO.MemoryStream()) {
            using (iTextSharp.text.Document doc = new iTextSharp.text.Document()) {
                using (iTextSharp.text.pdf.PdfSmartCopy pCopy = new iTextSharp.text.pdf.PdfSmartCopy(doc, msOutput) { PdfVersion = iTextSharp.text.pdf.PdfWriter.VERSION_1_7 }) {
                    doc.Open();
                    foreach (byte[] oFile in files) {
                        using (iTextSharp.text.pdf.PdfReader pdfFile = new iTextSharp.text.pdf.PdfReader(oFile)) {
                            for (i = 1; i <= pdfFile.NumberOfPages; i++) {
                                pCopy.AddPage(pCopy.GetImportedPage(pdfFile, i));
                                pCopy.FreeReader(pdfFile);
                            }
                        }
                    }
                }
            }

            return msOutput;
        }
    } else if (files.Count == 1) {
        return new System.IO.MemoryStream(files[0]);
    }

    return null;
}

【讨论】：

【解决方案3】：

要合并 PDF，请参阅“Merging two pdf pages into one using itextsharp”

【讨论】：

那些asp.net论坛帖子不幸使用PdfWriter和GetImportedPage但没有注释后处理；因此，当@mns 明确提到表单很重要时，表单元素会丢失。

【解决方案4】：

以下是我的 pdf 合并代码。感谢 Jonathan 提供关于重命名字段的建议，解决了将 pdf 页面与表单字段合并时的问题。

 private static void CombineAndSavePdf(string savePath, List<string> lstPdfFiles)
    {
        using (Stream outputPdfStream = new FileStream(savePath, FileMode.Create, FileAccess.Write, FileShare.None))
        {

            Document document = new Document();
            PdfSmartCopy copy = new PdfSmartCopy(document, outputPdfStream);
            document.Open();
            PdfReader reader;
            int totalPageCnt;
            PdfStamper stamper;
            string[] fieldNames;
            foreach (string file in lstPdfFiles)
            {
                reader = new PdfReader(file);
                totalPageCnt = reader.NumberOfPages;
                for (int pageCnt = 0; pageCnt < totalPageCnt; )
                {
                     //have to create a new reader for each page or PdfStamper will throw error
                    reader = new PdfReader(file);
                    stamper = new PdfStamper(reader, outputPdfStream);
                    fieldNames = new string[stamper.AcroFields.Fields.Keys.Count];
                    stamper.AcroFields.Fields.Keys.CopyTo(fieldNames, 0);
                    foreach (string name in fieldNames)
                    {
                        stamper.AcroFields.RenameField(name, name + "_file" + pageCnt.ToString());
                    }
                    copy.AddPage(copy.GetImportedPage(reader, ++pageCnt));                     
                }
                copy.FreeReader(reader);                    
            }
            document.Close();
        }
    }

【讨论】：

如果要合并的 pdf 超过一页，则此代码存在错误。