我可能有点晚了,但请听我说完。我必须解决同样的问题,并且由于在任何地方都有 0 信息,我已经调查了这个问题并设法找到了答案。这不是我最高效/最好的代码,但绝对是我最自豪的代码之一,因为以前没有人这样做过。
面临的问题是没有解决所有问题的 Python 的 Pdf 库
pdf相关问题。每个人处理问题的方式不同,有些人可以做其他人做不到的事情。为此,我不得不为以下两个函数使用 2 个库。 PyPDF2 在这里将书签添加到 pdf。 Pdfrw 在这里更改这些书签以具有打开 url 的操作。
简而言之,我们创建一个带有添加书签的新 pdf,以及另一个带有指向 url 的已更改书签操作的新 pdf。
要提一提的是,出于某种原因(至少对我而言)PyPDF2 添加所有书签的方式是,如果您有多个书签,它们都会成为前一个书签的子元素。这就是我们有 while 循环的原因,我们用它将所有书签收集到一个列表中,然后我们可以选择我们想要的。
如果您已经有书签并且不使用PyPDF2 添加它们,那么只需遍历metaObjects 字典并获取包含/Title 键的值就足够了。从而使代码显着变小。我已将此部分添加为评论。
下面是如何使用代码的示例:
inputPdf = r"C:\......\first.pdf"
bookmarkedPdf = r"C:\......\second.pdf"
pdfWithWeblink = r"C:\......\final.pdf"
bookmarks = [
{"Title": "The Phantom Menace", "Page": 5},
{"Title": "Attack of the Clones", "Page": 10},
{"Title": "Revenge Of The Sith", "Page": 13},
{"Title": "A New hope", "Page": 18},
{"Title": "The Empire Strikes Back", "Page": 26},
{"Title": "Return of the Jedi", "Page": 32}
]
AddBookmarks(inputPdf, bookmarkedPdf, bookmarks)
AddWebLinkToBookmark(bookmarkedPdf, pdfWithWeblink, "Revenge Of The Sith", "https://stackoverflow.com")
代码:
from PyPDF2 import PdfFileWriter, PdfFileReader
import pdfrw
def AddBookmarks(inputPdfPath: str, outputPdfPath: str, headers: dict) -> None:
""" Adds bookmarks to a PDF. """
output = PdfFileWriter()
input = PdfFileReader(open(inputPdfPath, 'rb'))
for i in range(input.getNumPages()):
output.addPage(input.getPage(i))
for header in headers:
if header["Page"] - 1 == i:
output.addBookmark(header["Title"], header["Page"] - 1, parent=None)
output.setPageMode("/UseOutlines")
outputStream = open(outputPdfPath,'wb')
output.write(outputStream)
outputStream.close()
return outputPdfPath
def AddWebLinkToBookmark(inputPdfPath: str, outputPdfPath: str, bookmarkTitle: str, url: str) -> None:
""" Changes the bookmark action to opening a web url. """
# Reading the Pdf with pdfrw and collecting its meta objects. The bookmarks are among these.
pdf = pdfrw.PdfReader(inputPdfPath, decompress=True)
metaObjects = pdf.indirect_objects
# If you did not add the bookmarks with PyPDF2 previously, use this part for getting the bookmarkToChange variable:
# bookmarkToChange = None
# for _, annotation in metaObjects.items():
# if '/Title' in annotation:
# if annotation["/Title"] == f"({bookmarkTitle})".replace(" ", "\\040"):
# bookmarkToChange = annotation
# if bookmarkToChange == None:
# print(f"There is no bookmark called '{bookmarkTitle}' in this pdf.")
# return
try:
# Selecting the first, top parent bookmark.
bookmark = [annotation for _, annotation in metaObjects.items() if '/Title' in annotation][0]
except IndexError:
print("There are no bookmarks in this pdf.")
return
# Each bookmark is the child of the previous bookmark. They can be accessed from the parent with the '/Next' key.
bookmarkAnnotations = [bookmark]
while "/Next" in bookmark:
if "/Title" not in bookmark["/Next"]:
break
bookmark = bookmark["/Next"]
bookmarkAnnotations.append(bookmark)
try:
# Selecting the bookmark we want to add the url to.
bookmarkToChange = [annotation for annotation in bookmarkAnnotations if annotation["/Title"] == f"({bookmarkTitle})".replace(" ", "\\040")][0]
except IndexError:
print(f"There is no bookmark called '{bookmarkTitle}' in this pdf.")
return
# Changing the internal PDF commands to point to a url instead of a page.
bookmarkToChange.A.D = None # Deletes the page information the 'Go to page' action is pointing to.
bookmarkToChange.A.S = pdfrw.PdfName("URI") # Changes the 'Go to page' action to an 'Open a web link' action.
bookmarkToChange.A.URI = pdfrw.objects.pdfstring.PdfString(f"({url})") # Specifies the url for the 'Open a web link' action.
# Saving the end result into a new file.
pdfrw.PdfWriter().write(outputPdfPath, pdf)