在 Python 中使用 pdfkit 将多个 html 文件转换为 pdf答案

【问题标题】：Converting Multiple html file into pdf using pdfkit in Python在 Python 中使用 pdfkit 将多个 html 文件转换为 pdf
【发布时间】：2018-04-29 22:07:10
【问题描述】：

我尝试使用 pdfkik 将多个 html 文件转换为 pdf。这是我的代码：

from bs4 import BeautifulSoup
from selenium import webdriver
import pdfkit

driver=webdriver.Chrome()
driver.get('https://www.linkedin.com/in/jaypratappandey/')
time.sleep(40)
soup= BeautifulSoup(driver.page_source, 'lxml')
data=[]
f=open('htmlfile.html', 'w')
top=open('tophtmlfile.html', 'w')

for name in soup.select('.pv-top-card-section__body'):
    top.write("%s" % name)

for item in soup.select('.pv-oc.ember-view'):
    f.write("%s" % item)


pdfkit.from_file(['tophtmlfile.html', 'htmlfile.html'], 'jayprofile.pdf')

driver.quit()

此代码给出以下错误：

Traceback (most recent call last):
  File "lkdndata.py", line 23, in <module>
    pdfkit.from_file(['tophtmlfile.html', 'htmlfile.html'], 'ankurprofile.pdf')
  File "/usr/local/lib/python3.5/dist-packages/pdfkit/api.py", line 49, in from_file
    return r.to_pdf(output_path)
  File "/usr/local/lib/python3.5/dist-packages/pdfkit/pdfkit.py", line 156, in to_pdf
    raise IOError('wkhtmltopdf reported an error:\n' + stderr)
OSError: wkhtmltopdf reported an error:
Error: This version of wkhtmltopdf is build against an unpatched version of QT, and does not support more then one input document.
Exit with code 1, due to unknown error.

【问题讨论】：

标签： python python-3.x pdf web-scraping

【解决方案1】：

我有同样的错误。您可能遇到的错误是由于您的 qt 安装不一致以及兼容的 qt 版本不可用。尝试运行

wkhtmltopdf

在您的终端上，看看您是否可以找到“Reduced Functionality”。

如果是，那么我的假设是正确的，那么您最安全的选择就是从源代码编译它。

【讨论】：

【解决方案2】：

我找到的解决方案是首先将 html 文件合并为一个，然后继续使用 pdfkit 进行转换。所以在你的情况下，将 tophtml 和 html 文件一起保存在同一个目录中并替换该目录的路径。

import pdfkit
import os

# path to folder containing html files
path = "/home/ec2-user/data-science-processes/src/results/"

def multiple_html_to_pdf(path):
    """ converts multiple html files to a single pdf
    args: path to directory containing html files
    """
    empty_html = '<html><head></head><body></body></html>'
    for file in os.listdir(path):
        if file.endswith(".html"):
            print(file)
            # append html files
            with open(path + file, 'r') as f:
                html = f.read()
                empty_html = empty_html.replace('</body></html>', html + '</body></html>')
    # save merged html
    with open('merged.html', 'w') as f:
        f.write(empty_html)
    pdfkit.from_file('/home/ec2-user/data-science-processes/report/merged.html','Report.pdf')

multiple_html_to_pdf(path)

【讨论】：