PyPDF2 从第 2 页附加 PDF答案

【问题标题】：PyPDF2 append a PDF from the 2nd pagePyPDF2 从第 2 页附加 PDF
【发布时间】：2016-04-16 10:59:34
【问题描述】：

我正在学习如何使用“使无聊的东西自动化”一书进行编程，但是我在第 13 章中偶然发现了一个障碍。 “合并多个 PDF，但从除第一页之外的所有页面中省略标题页”

在书中，他们通过循环遍历 PDF 来做到这一点，但是，在查看 PyPDF2 模块时，我发现“页面”选项是一种更简洁的解决方案。但是，我很难让它发挥作用。

不要看它是否是pythonic或其他东西。我还没有学习类 ;-) 读完这本书后，我打算从类、对象、装饰器、*args 和 **kwargs 开始学习 ;-)

我在我的 sn-p 的最后一行代码中需要帮助。

我的代码：

  for fn_PdfObjects in range(len(l_fn_PdfObjects)):
if fn_PdfObjects != 0:
     break
else:
  ## watermark the first sheet
  addWatermark(l_fn_PdfObjects[fn_PdfObjects])
  watermarkedPage = PyPDF2.PdfFileReader(open('watermarkedCover.pdf', 'rb'))
  #   the 'position = ' is the page in the destination PDF it will receive
  tempMergerFile.merge(position=fn_PdfObjects, fileobj=watermarkedPage)
  tempMergerFile.merge(position=fn_PdfObjects+1, fileobj=l_fn_PdfObjects[fn_PdfObjects],pages='0:')

查看模块时，我发现：源：https://pythonhosted.org/PyPDF2/PdfFileMerger.html

合并（位置，文件对象，书签=无，页面=无，import_bookmarks=True）

pages – 可以是 Page Range 或 (start, stop[, step]) 元组，仅将源文档中指定范围的页面合并到输出文档中。

我也发现了这个关于 page_ranges 的内容，但无论我尝试什么，我都无法让它工作：源：https://github.com/mstamy2/PyPDF2/blob/master/PyPDF2/pagerange.py

class PageRange(object):
"""
A slice-like representation of a range of page indices,
    i.e. page numbers, only starting at zero.
The syntax is like what you would put between brackets [ ].
The slice is one of the few Python types that can't be subclassed,
but this class converts to and from slices, and allows similar use.
  o  PageRange(str) parses a string representing a page range.
  o  PageRange(slice) directly "imports" a slice.
  o  to_slice() gives the equivalent slice.
  o  str() and repr() allow printing.
  o  indices(n) is like slice.indices(n).
"""

def __init__(self, arg):
    """
    Initialize with either a slice -- giving the equivalent page range,
    or a PageRange object -- making a copy,
    or a string like
        "int", "[int]:[int]" or "[int]:[int]:[int]",
        where the brackets indicate optional ints.
    {page_range_help}
    Note the difference between this notation and arguments to slice():
        slice(3) means the first three pages;
        PageRange("3") means the range of only the fourth page.
        However PageRange(slice(3)) means the first three pages.
    """

收到的错误如下： TypeError: "pages" must be a tuple of (start, stop[, step])

    Traceback (most recent call last):
File "combining_select_pages_from_many_pdfs.py", line 112, in <module>
main() 
File "combining_select_pages_from_many_pdfs.py", line 104, in main
newPdfFile = mergePdfFiles(l_PdfObjects)
File "combining_select_pages_from_many_pdfs.py", line 63, in mergePdfFiles
tempMergerFile.merge(position=fn_PdfObjects+1, fileobj=l_fn_PdfObjects[fn_PdfObjects],pages=[0])
File "/home/sybie/.local/lib/python3.5/site-packages/PyPDF2/merger.py", line 143, in merge
raise TypeError('"pages" must be a tuple of (start, stop[, step])')

我能找到的是：

# Find the range of pages to merge.
    if pages == None:
        pages = (0, pdfr.getNumPages())
    elif isinstance(pages, PageRange):
        pages = pages.indices(pdfr.getNumPages())
    elif not isinstance(pages, tuple):
        raise TypeError('"pages" must be a tuple of (start, stop[, step])')

src：https://github.com/mstamy2/PyPDF2/blob/master/PyPDF2/merger.py#L137

提前感谢所有帮助！

【问题讨论】：

标签： python pdf merge pypdf2

【解决方案1】：

我通过这样做解决了这个问题：

pages=(1,l_fn_PdfObjects[fn_PdfObjects].numPages)

事实上，我把它做成了一个元组。如果有人仍然可以告诉我页面范围是如何工作的，我将不胜感激！

【讨论】：

【解决方案2】：

看来您必须使用 parse_filename_page_ranges 函数。大概是这样的：

from PyPDF2 import PdfFileMerger, parse_filename_page_ranges
args=[records_pdf,'0:1',inv_pdf,records_pdf,'1:']
filename_page_ranges = parse_filename_page_ranges(args.fn_pgrgs)

output = open(destinationfile, "wb")

merger = PdfFileMerger()
in_fs = dict()
try:
    for (filename, page_range) in filename_page_ranges:
        if filename not in in_fs:
            in_fs[filename] = open(filename, "rb")
        merger.append(in_fs[filename], pages=page_range)
except:
    print(traceback.format_exc(), file=stderr)
    print("Error while reading " + filename, file=stderr)
    exit(1)
merger.write(output)

【讨论】：