Python PDF如何添加书签url而不是页码答案

【问题标题】：Python PDF how to add bookmark url instead of page numberPython PDF如何添加书签url而不是页码
【发布时间】：2021-08-04 18:03:40
【问题描述】：

我正在使用 python 3.6 和 PyPDF2 在 pdf 中创建书签。而不是在 pdf 中的页面上添加书签。我想添加一个网址（例如https://stackoverflow.com）作为书签。

这样的？

output.addBookmark('TEST', 'https://stackoverflow.com', parent=None)

我认为 PyPDF2 不支持这样的东西，还是支持？是否有其他库可以支持此功能？

from PyPDF2 import PdfFileReader, PdfFileWriter

output = PdfFileWriter()
input = PdfFileReader(open('test.pdf', 'rb'))
output.addPage(input.getPage(0))
output.addBookmark('TEST', 0, parent=None)  # add bookmark
outputStream = open('output.pdf', 'wb')
output.write(outputStream)
outputStream.close()

【问题讨论】：

@KJ 我发布的代码工作正常，但我不想链接到 pdf 中的某个部分。我希望它链接到网页。
@KJ 我认为这很容易做到，因为手动操作很容易但很乏味。如果您想查看我试图获得的结果，我的示例 pdf 中的书签链接到网页。示例 pdf：drive.google.com/file/d/1ld-CwHfA2VpWeqCGl8Id0K8opsNTi51V/…

标签： python pdf pypdf2

【解决方案1】：

我可能有点晚了，但请听我说完。我必须解决同样的问题，并且由于在任何地方都有 0 信息，我已经调查了这个问题并设法找到了答案。这不是我最高效/最好的代码，但绝对是我最自豪的代码之一，因为以前没有人这样做过。

面临的问题是没有解决所有问题的 Python 的 Pdf 库 pdf相关问题。每个人处理问题的方式不同，有些人可以做其他人做不到的事情。为此，我不得不为以下两个函数使用 2 个库。 PyPDF2 在这里将书签添加到 pdf。 Pdfrw 在这里更改这些书签以具有打开 url 的操作。

简而言之，我们创建一个带有添加书签的新 pdf，以及另一个带有指向 url 的已更改书签操作的新 pdf。

要提一提的是，出于某种原因（至少对我而言）PyPDF2 添加所有书签的方式是，如果您有多个书签，它们都会成为前一个书签的子元素。这就是我们有 while 循环的原因，我们用它将所有书签收集到一个列表中，然后我们可以选择我们想要的。

如果您已经有书签并且不使用PyPDF2 添加它们，那么只需遍历metaObjects 字典并获取包含/Title 键的值就足够了。从而使代码显着变小。我已将此部分添加为评论。

下面是如何使用代码的示例：

inputPdf = r"C:\......\first.pdf"
bookmarkedPdf = r"C:\......\second.pdf"
pdfWithWeblink = r"C:\......\final.pdf"

bookmarks = [
    {"Title": "The Phantom Menace", "Page": 5}, 
    {"Title": "Attack of the Clones", "Page": 10},
    {"Title": "Revenge Of The Sith", "Page": 13},
    {"Title": "A New hope", "Page": 18},
    {"Title": "The Empire Strikes Back", "Page": 26},
    {"Title": "Return of the Jedi", "Page": 32}
]

AddBookmarks(inputPdf, bookmarkedPdf, bookmarks)
AddWebLinkToBookmark(bookmarkedPdf, pdfWithWeblink, "Revenge Of The Sith", "https://stackoverflow.com")

代码：

from PyPDF2 import PdfFileWriter, PdfFileReader
import pdfrw

def AddBookmarks(inputPdfPath: str, outputPdfPath: str, headers: dict) -> None:
    """ Adds bookmarks to a PDF. """

    output = PdfFileWriter()
    input = PdfFileReader(open(inputPdfPath, 'rb'))

    for i in range(input.getNumPages()):
        output.addPage(input.getPage(i))
        for header in headers:
            if header["Page"] - 1 == i:
                output.addBookmark(header["Title"], header["Page"] - 1, parent=None) 
                output.setPageMode("/UseOutlines")

    outputStream = open(outputPdfPath,'wb')
    output.write(outputStream)
    outputStream.close()

    return outputPdfPath

def AddWebLinkToBookmark(inputPdfPath: str, outputPdfPath: str, bookmarkTitle: str, url: str) -> None:
    """ Changes the bookmark action to opening a web url. """

    # Reading the Pdf with pdfrw and collecting its meta objects. The bookmarks are among these.
    pdf = pdfrw.PdfReader(inputPdfPath, decompress=True)
    metaObjects = pdf.indirect_objects

    # If you did not add the bookmarks with PyPDF2 previously, use this part for getting the bookmarkToChange variable:
    # bookmarkToChange = None
    # for _, annotation in metaObjects.items():
    #     if '/Title' in annotation:
    #         if annotation["/Title"] == f"({bookmarkTitle})".replace(" ", "\\040"):
    #             bookmarkToChange = annotation
    # if bookmarkToChange == None:
    #     print(f"There is no bookmark called '{bookmarkTitle}' in this pdf.")
    #     return

    try:
        # Selecting the first, top parent bookmark.
        bookmark = [annotation for _, annotation in metaObjects.items() if '/Title' in annotation][0]
    except IndexError:
        print("There are no bookmarks in this pdf.")
        return

    # Each bookmark is the child of the previous bookmark. They can be accessed from the parent with the '/Next' key.
    bookmarkAnnotations = [bookmark]
    while "/Next" in bookmark:
        if "/Title" not in bookmark["/Next"]:
            break
        bookmark = bookmark["/Next"]
        bookmarkAnnotations.append(bookmark)

    try:
        # Selecting the bookmark we want to add the url to.
        bookmarkToChange = [annotation for annotation in bookmarkAnnotations if annotation["/Title"] == f"({bookmarkTitle})".replace(" ", "\\040")][0]
    except IndexError:
        print(f"There is no bookmark called '{bookmarkTitle}' in this pdf.")
        return

    # Changing the internal PDF commands to point to a url instead of a page.
    bookmarkToChange.A.D = None                                                 # Deletes the page information the 'Go to page' action is pointing to.
    bookmarkToChange.A.S = pdfrw.PdfName("URI")                                 # Changes the 'Go to page' action to an 'Open a web link' action.
    bookmarkToChange.A.URI = pdfrw.objects.pdfstring.PdfString(f"({url})")      # Specifies the url for the 'Open a web link' action.

    # Saving the end result into a new file.
    pdfrw.PdfWriter().write(outputPdfPath, pdf)

【讨论】：