【问题标题】:Converting Multiple html file into pdf using pdfkit in Python在 Python 中使用 pdfkit 将多个 html 文件转换为 pdf
【发布时间】:2018-04-29 22:07:10
【问题描述】:

我尝试使用 pdfkik 将多个 html 文件转换为 pdf。这是我的代码:

from bs4 import BeautifulSoup
from selenium import webdriver
import pdfkit

driver=webdriver.Chrome()
driver.get('https://www.linkedin.com/in/jaypratappandey/')
time.sleep(40)
soup= BeautifulSoup(driver.page_source, 'lxml')
data=[]
f=open('htmlfile.html', 'w')
top=open('tophtmlfile.html', 'w')

for name in soup.select('.pv-top-card-section__body'):
    top.write("%s" % name)

for item in soup.select('.pv-oc.ember-view'):
    f.write("%s" % item)


pdfkit.from_file(['tophtmlfile.html', 'htmlfile.html'], 'jayprofile.pdf')

driver.quit()

此代码给出以下错误:

Traceback (most recent call last):
  File "lkdndata.py", line 23, in <module>
    pdfkit.from_file(['tophtmlfile.html', 'htmlfile.html'], 'ankurprofile.pdf')
  File "/usr/local/lib/python3.5/dist-packages/pdfkit/api.py", line 49, in from_file
    return r.to_pdf(output_path)
  File "/usr/local/lib/python3.5/dist-packages/pdfkit/pdfkit.py", line 156, in to_pdf
    raise IOError('wkhtmltopdf reported an error:\n' + stderr)
OSError: wkhtmltopdf reported an error:
Error: This version of wkhtmltopdf is build against an unpatched version of QT, and does not support more then one input document.
Exit with code 1, due to unknown error.

【问题讨论】:

    标签: python python-3.x pdf web-scraping


    【解决方案1】:

    我有同样的错误。您可能遇到的错误是由于您的 qt 安装不一致以及兼容的 qt 版本不可用。 尝试运行

    wkhtmltopdf
    

    在您的终端上,看看您是否可以找到“Reduced Functionality”。

    如果是,那么我的假设是正确的,那么您最安全的选择就是从源代码编译它。

    【讨论】:

      【解决方案2】:

      我找到的解决方案是首先将 html 文件合并为一个,然后继续使用 pdfkit 进行转换。所以在你的情况下,将 tophtml 和 html 文件一起保存在同一个目录中并替换该目录的路径。

      import pdfkit
      import os
      
      # path to folder containing html files
      path = "/home/ec2-user/data-science-processes/src/results/"
      
      def multiple_html_to_pdf(path):
          """ converts multiple html files to a single pdf
          args: path to directory containing html files
          """
          empty_html = '<html><head></head><body></body></html>'
          for file in os.listdir(path):
              if file.endswith(".html"):
                  print(file)
                  # append html files
                  with open(path + file, 'r') as f:
                      html = f.read()
                      empty_html = empty_html.replace('</body></html>', html + '</body></html>')
          # save merged html
          with open('merged.html', 'w') as f:
              f.write(empty_html)
          pdfkit.from_file('/home/ec2-user/data-science-processes/report/merged.html','Report.pdf')
      
      multiple_html_to_pdf(path)
      

      【讨论】:

        猜你喜欢
        • 2018-08-19
        • 1970-01-01
        • 2022-08-17
        • 2021-12-12
        • 1970-01-01
        • 2012-05-01
        • 1970-01-01
        • 2011-11-26
        • 1970-01-01
        相关资源
        最近更新 更多