【问题标题】:Merge PDF files with TOC element将 PDF 文件与 TOC 元素合并
【发布时间】:2021-09-22 07:09:54
【问题描述】:

我正在合并 PDF 文件,使用 GemBox.Pdf 作为shown here。这很好用,我可以轻松添加轮廓。

我之前做过类似的事情,并将 Word 文件与 GemBox.Document 合并为shown here

但现在我的问题是 GemBox.Pdf 中没有 TOC 元素。我想在将多个 PDF 文件合并为一个时自动获取目录。

我是否遗漏了什么,或者 PDF 真的没有这样的元素?
我是否需要重新创建它,如果是,那我该怎么做?
我可以添加书签,但我不知道如何添加指向它的链接。

【问题讨论】:

    标签: c# pdf tableofcontents gembox-pdf


    【解决方案1】:

    PDF文件中没有这样的元素,所以我们需要自己创建这个内容。

    现在一种方法是创建文本元素、轮廓和链接注释,适当定位它们,并将链接目标设置为轮廓。

    但是,这可能需要相当多的工作,因此使用 GemBox.Document 创建所需的 TOC 元素,将其保存为 PDF 文件,然后将其导入生成的 PDF 可能会更容易。

    // Source data for creating TOC entries with specified text and associated PDF files.
    var pdfEntries = new[]
    {
        new { Title = "First Document Title", Pdf = PdfDocument.Load("input1.pdf") },
        new { Title = "Second Document Title", Pdf = PdfDocument.Load("input2.pdf") },
        new { Title = "Third Document Title", Pdf = PdfDocument.Load("input3.pdf") },
    };
    
    /***************************************************************/
    /* Create new document with TOC element using GemBox.Document. */
    /***************************************************************/
    
    // Create new document.
    var tocDocument = new DocumentModel();
    var section = new Section(tocDocument);
    tocDocument.Sections.Add(section);
    
    // Create and add TOC element.
    var toc = new TableOfEntries(tocDocument, FieldType.TOC);
    section.Blocks.Add(toc);
    section.Blocks.Add(new Paragraph(tocDocument, new SpecialCharacter(tocDocument, SpecialCharacterType.PageBreak)));
    
    // Create heading style.
    // By default, when updating TOC element a TOC entry is created for each paragraph that has heading style.
    var heading1Style = (ParagraphStyle)tocDocument.Styles.GetOrAdd(StyleTemplateType.Heading1);
    
    // Add heading and empty (placeholder) pages.
    // The number of added placeholder pages depend on the number of pages that actual PDF file has so that TOC entries have correct page numbers.
    int totalPageCount = 0;
    foreach (var pdfEntry in pdfEntries)
    {
        section.Blocks.Add(new Paragraph(tocDocument, pdfEntry.Title) { ParagraphFormat = { Style = heading1Style } });
        section.Blocks.Add(new Paragraph(tocDocument, new SpecialCharacter(tocDocument, SpecialCharacterType.PageBreak)));
    
        int currentPageCount = pdfEntry.Pdf.Pages.Count;
        totalPageCount += currentPageCount;
    
        while (--currentPageCount > 0)
            section.Blocks.Add(new Paragraph(tocDocument, new SpecialCharacter(tocDocument, SpecialCharacterType.PageBreak)));
    }
    
    // Remove last extra-added empty page.
    section.Blocks.RemoveAt(section.Blocks.Count - 1);
    
    // Update TOC element and save the document as PDF stream.
    toc.Update();
    var pdfStream = new MemoryStream();
    tocDocument.Save(pdfStream, new GemBox.Document.PdfSaveOptions());
    
    /***************************************************************/
    /* Merge PDF files into PDF with TOC element using GemBox.Pdf. */
    /***************************************************************/
    
    // Load a PDF stream using GemBox.Pdf.
    var pdfDocument = PdfDocument.Load(pdfStream);
    var rootDictionary = (PdfDictionary)((PdfIndirectObject)pdfDocument.GetDictionary()[PdfName.Create("Root")]).Value;
    var pagesDictionary = (PdfDictionary)((PdfIndirectObject)rootDictionary[PdfName.Create("Pages")]).Value;
    var kidsArray = (PdfArray)pagesDictionary[PdfName.Create("Kids")];
    var pageIds = kidsArray.Cast<PdfIndirectObject>().Select(obj => obj.Id).ToArray();
    
    // Remove empty (placeholder) pages.
    while (totalPageCount-- > 0)
        pdfDocument.Pages.RemoveAt(pdfDocument.Pages.Count - 1);
    
    // Add pages from PDF files.
    foreach (var pdfEntry in pdfEntries)
        foreach (var page in pdfEntry.Pdf.Pages)
            pdfDocument.Pages.AddClone(page);
    
    /*****************************************************************************/
    /* Update TOC links from placeholder pages to actual pages using GemBox.Pdf. */
    /*****************************************************************************/
    
    // Create a mapping from an ID of a empty (placeholder) page indirect object to an actual page indirect object.
    var pageCloneMap = new Dictionary<PdfIndirectObjectIdentifier, PdfIndirectObject>();
    for (int i = 0; i < kidsArray.Count; ++i)
        pageCloneMap.Add(pageIds[i], (PdfIndirectObject)kidsArray[i]);
    
    foreach (var entry in pageCloneMap)
    {
        // If page was updated, it means that we passed TOC pages, so break from the loop.
        if (entry.Key != entry.Value.Id)
            break;
    
        // For each TOC page, get its 'Annots' entry.
        // For each link annotation from the 'Annots' get the 'Dest' entry.
        // Update the first item in the 'Dest' array so that it no longer points to a removed page.
        if (((PdfDictionary)entry.Value.Value).TryGetValue(PdfName.Create("Annots"), out PdfBasicObject annotsObj))
            foreach (PdfIndirectObject annotObj in (PdfArray)annotsObj)
                if (((PdfDictionary)annotObj.Value).TryGetValue(PdfName.Create("Dest"), out PdfBasicObject destObj))
                {
                    var destArray = (PdfArray)destObj;
                    destArray[0] = pageCloneMap[((PdfIndirectObject)destArray[0]).Id];
                }
    }
    
    // Save resulting PDF file.
    pdfDocument.Save("Result.pdf");
    pdfDocument.Close();
    

    通过这种方式,您可以使用 TOC 开关和样式轻松自定义 TOC 元素。有关详细信息,请参阅 GemBox.Document 中的 Table Of Content example

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2012-04-28
      • 2011-07-17
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多