使用 puppeteer 打印 PDF 时，如何获取打印元素的页面？答案

【问题标题】：When printing PDF with puppeteer how can I get what page my element is printed on?使用 puppeteer 打印 PDF 时，如何获取打印元素的页面？
【发布时间】：2021-05-28 10:33:18
【问题描述】：

我正在使用 puppeteer 打印 PDF 文档，并想为文档中的所有图像和表格制作目录，但我需要找出这些图像和表格的最终页码。有什么办法吗？

用固定的页面高度计算这些东西听起来很复杂，因为由于不间断的 CSS 规则，元素可能会在页面之间移动。

【问题讨论】：

标签： javascript pdf puppeteer

【解决方案1】：

找到一个解决方案。

在生成的 PDF 文档中，所有需要在 TOC 中的元素都有唯一的 ID，并在前面加上引用它们的空锚。

<a href="#section_123"></a>
<div id="section_123">Section</div>

这样生成的 PDF 会保留这些 ID。

然后我们采取pdfjs-dist。所有这些空链接都作为目的地写入 PDF。

// npm install pdfjs-dist

async function getLocalLinkPages(src) {
    const doc = await pdfjs.getDocument(src).promise;
    // destinations represent all the empty links
    const destinations = await doc.getDestinations();

    return Promise.all(
        Object.entries(destinations).map(async ([destination, [ref]]) => {
            // ref uniquely identifies the page. It looks like { num: 10, gen: 0 } for example,
            // but we don't have to bother and can just use doc.getPageIndex
            const page = (await doc.getPageIndex(ref)) + 1;
            return {destination, page};
        })
    );
}

结果如下所示

[
    {
        "destination": "section_123",
        "page": 4
    },
    {
        "destination": "component_345",
        "page": 5
    }
]

【讨论】：