在java中将docx转换为pdf答案

【问题标题】：Converting docx into pdf in java在java中将docx转换为pdf
【发布时间】：2017-09-07 21:14:50
【问题描述】：

我正在尝试将包含表格和图像的docx 文件转换为pdf 格式文件。

我一直在到处寻找，但没有得到正确的解决方案，请求给出正确和正确的解决方案：

这是我尝试过的：

import java.io.File;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.InputStream;
import java.io.OutputStream;
import org.apache.poi.xwpf.converter.pdf.PdfConverter;
import org.apache.poi.xwpf.converter.pdf.PdfOptions;
import org.apache.poi.xwpf.usermodel.XWPFDocument;

public class TestCon {

    public static void main(String[] args) {
        TestCon cwoWord = new TestCon();
        System.out.println("Start");
        cwoWord.ConvertToPDF("D:\\Test.docx", "D:\\Test1.pdf");
    }

    public void ConvertToPDF(String docPath, String pdfPath) {
        try {
            InputStream doc = new FileInputStream(new File(docPath));
            XWPFDocument document = new XWPFDocument(doc);
            PdfOptions options = PdfOptions.create();
            OutputStream out = new FileOutputStream(new File(pdfPath));
            PdfConverter.getInstance().convert(document, out, options);
            System.out.println("Done");
        } catch (FileNotFoundException ex) {
            System.out.println(ex.getMessage());
        } catch (IOException ex) {

            System.out.println(ex.getMessage());
        }
    }

}

例外：

Exception in thread "main" java.lang.IllegalAccessError: tried to access method org.apache.poi.util.POILogger.log(ILjava/lang/Object;)V from class org.apache.poi.openxml4j.opc.PackageRelationshipCollection
at org.apache.poi.openxml4j.opc.PackageRelationshipCollection.parseRelationshipsPart(PackageRelationshipCollection.java:313)
at org.apache.poi.openxml4j.opc.PackageRelationshipCollection.<init>(PackageRelationshipCollection.java:162)
at org.apache.poi.openxml4j.opc.PackageRelationshipCollection.<init>(PackageRelationshipCollection.java:130)
at org.apache.poi.openxml4j.opc.PackagePart.loadRelationships(PackagePart.java:559)
at org.apache.poi.openxml4j.opc.PackagePart.<init>(PackagePart.java:112)
at org.apache.poi.openxml4j.opc.PackagePart.<init>(PackagePart.java:83)
at org.apache.poi.openxml4j.opc.PackagePart.<init>(PackagePart.java:128)
at org.apache.poi.openxml4j.opc.ZipPackagePart.<init>(ZipPackagePart.java:78)
at org.apache.poi.openxml4j.opc.ZipPackage.getPartsImpl(ZipPackage.java:239)
at org.apache.poi.openxml4j.opc.OPCPackage.getParts(OPCPackage.java:665)
at org.apache.poi.openxml4j.opc.OPCPackage.open(OPCPackage.java:274)
at org.apache.poi.util.PackageHelper.open(PackageHelper.java:39)
at org.apache.poi.xwpf.usermodel.XWPFDocument.<init>(XWPFDocument.java:121)
at test.TestCon.ConvertToPDF(TestCon.java:31)
at test.TestCon.main(TestCon.java:25)

我的要求是创建一个 java 代码，将现有的 docx 转换为具有正确格式和对齐方式的 pdf。

请提出建议。

使用的罐子：

【问题讨论】：

How to convert MS doc to pdf的可能重复
如果您坚持使用 ApachePOI，这里还有另一个答案：stackoverflow.com/questions/6201736/…
Bruno 只能在您的问题是关于 iText 的情况下提供帮助。但是您的问题不是关于 iText，而是关于 Apache POI。无论如何，如果您标记尚未发表评论的人，那么当您标记他们时，他们将不会收到通知。 Stack Overflow 这样做是为了防止标签垃圾邮件，这是对你所做的事情的描述。
如果你不断改变球门柱，没有人能帮助你。在过去 15 分钟内，您多次编辑帖子并多次更改库。
在您的众多编辑之一中，您的例外中有com.lowagie。（我可以看到编辑历史）这意味着您使用的是 iText 的 ancient 版本，2.1.7 或更早版本，至少有 8 年历史。由于您似乎信任 Bruno Lowagie 的专业知识，因此您应该熟悉他对仍在使用这种旧 iText 版本的人的看法。

标签： java pdf ms-word apache-poi

【解决方案1】：

您缺少一些库。

我可以通过添加以下库来运行您的代码：

阿帕奇 POI 3.15 org.apache.poi.xwpf.converter.core-1.0.6.jar org.apache.poi.xwpf.converter.pdf-1.0.6.jar fr.opensagres.xdocreport.itext.extension-2.0.0.jar itext-2.1.7.jar ooxml-schemas-1.3.jar

我已成功转换了一个 6 页长的 Word 文档 (.docx)，其中包含表格、图像和各种格式。

【讨论】：

请注意，尽管有包名，但这并不是使用 POI 进行转换。只有 ooxml-schemas-1.3.jar 来自 Apache POI。其余的来自opensagres 和itext 项目。
@jmarkmurphy 我的方法不是重新发明轮子，只是让有问题的代码能够工作。
您的回答没有错，只是不想让任何人误以为 org.apache.poi.xwpf.converter.* 软件包是 Apache POI 的一部分。
谢谢你，先生，它帮助了我
@BakedInhalf 位置是mvnrepository.com/artifact/org.apache.poi/ooxml-schemas/1.3

【解决方案2】：

除了 VivekRatanSinha answer，我想发布完整的代码和所需的 jars 以供将来需要它的人使用。

代码：

import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.InputStream;
import java.io.OutputStream;

import org.apache.poi.xwpf.converter.pdf.PdfConverter;
import org.apache.poi.xwpf.converter.pdf.PdfOptions;
import org.apache.poi.xwpf.usermodel.XWPFDocument;

public class WordConvertPDF {
    public static void main(String[] args) {
        WordConvertPDF cwoWord = new WordConvertPDF();
        cwoWord.ConvertToPDF("D:/Test.docx", "D:/Test.pdf");
    }

    public void ConvertToPDF(String docPath, String pdfPath) {
        try {
            InputStream doc = new FileInputStream(new File(docPath));
            XWPFDocument document = new XWPFDocument(doc);
            PdfOptions options = PdfOptions.create();
            OutputStream out = new FileOutputStream(new File(pdfPath));
            PdfConverter.getInstance().convert(document, out, options);
        } catch (IOException ex) {
            System.out.println(ex.getMessage());
        }
    }
}

和 JARS：

享受:)

【讨论】：

@user1999397 你能提供完整的示例吗，因为这些库相互依赖
请添加 pom 而不是 jars 图片。
我按照你的代码和lib得到错误：没有找到有效的条目或内容，这不是一个有效的OOXML（Office Open XML）文件。

【解决方案3】：

我使用这个代码。

private byte[] toPdf(ByteArrayOutputStream docx) {
    InputStream isFromFirstData = new ByteArrayInputStream(docx.toByteArray());

    XWPFDocument document = new XWPFDocument(isFromFirstData);
    PdfOptions options = PdfOptions.create();

    //make new file in c:\temp\
    OutputStream out = new FileOutputStream(new File("c:\\tmp\\HelloWord.pdf"));
    PdfConverter.getInstance().convert(document, out, options);

    //return byte array for return in http request.
    ByteArrayOutputStream pdf = new ByteArrayOutputStream();
    PdfConverter.getInstance().convert(document, pdf, options);

    document.write(pdf);
    document.close();
    return pdf.toByteArray();
}

【讨论】：

【解决方案4】：

你需要添加这些maven依赖

  <dependency>
        <groupId>org.apache.poi</groupId>
        <artifactId>poi</artifactId>
        <version>4.0.1</version>
    </dependency>

    <dependency>
        <groupId>org.apache.poi</groupId>
        <artifactId>poi-ooxml</artifactId>
        <version>4.0.1</version>
    </dependency>

    <dependency>
        <groupId>org.apache.poi</groupId>
        <artifactId>poi-scratchpad</artifactId>
        <version>4.0.1</version>
    </dependency>

    <dependency>
        <groupId>org.apache.poi</groupId>
        <artifactId>poi-ooxml-schemas</artifactId>
        <version>4.0.1</version>
    </dependency>

    <dependency>
        <groupId>org.apache.poi</groupId>
        <artifactId>poi-excelant</artifactId>
        <version>4.0.1</version>
    </dependency>

    <dependency>
        <groupId>org.apache.poi</groupId>
        <artifactId>poi-examples</artifactId>
        <version>4.0.1</version>
    </dependency>

    <dependency>
        <groupId>fr.opensagres.xdocreport</groupId>
        <artifactId>org.apache.poi.xwpf.converter.core</artifactId>
        <version>1.0.6</version>
    </dependency>

    <dependency>
        <groupId>fr.opensagres.xdocreport</groupId>
        <artifactId>org.apache.poi.xwpf.converter.pdf</artifactId>
        <version>1.0.6</version>
    </dependency>

【讨论】：

【解决方案5】：

我做了很多研究，发现 Documents4j 是最好的将 docx 转换为 pdf 的免费 API。 Alignment、font一切documents4j都做得很好。

Maven 依赖项：

<dependency>
    <groupId>com.documents4j</groupId>
    <artifactId>documents4j-local</artifactId>
    <version>1.0.3</version>
</dependency>
<dependency>
    <groupId>com.documents4j</groupId>
    <artifactId>documents4j-transformer-msoffice-word</artifactId>
    <version>1.0.3</version>
</dependency>

使用以下代码将 docx 转换为 pdf。

import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.InputStream;
import java.io.OutputStream;

import com.documents4j.api.DocumentType;
import com.documents4j.api.IConverter;
import com.documents4j.job.LocalConverter;

public class Document4jApp {

    public static void main(String[] args) {

        File inputWord = new File("Tests.docx");
        File outputFile = new File("Test_out.pdf");
        try  {
            InputStream docxInputStream = new FileInputStream(inputWord);
            OutputStream outputStream = new FileOutputStream(outputFile);
            IConverter converter = LocalConverter.builder().build();
            converter.convert(docxInputStream).as(DocumentType.DOCX).to(outputStream).as(DocumentType.PDF).execute();
            outputStream.close();
            System.out.println("success");
        } catch (Exception e) {
            e.printStackTrace();
        }
    }
}

【讨论】：

documents4j 的问题是它需要安装 MS 组件，因此它几乎无法在 Linux 上运行。 github.com/documents4j/documents4j/issues/41
记得在execute();之后拨打converter.shutDown();
@povis，是的，您是正确的documents4j 不在Linux 环境中工作。对 windows 和 linux 环境都使用 docx4j stackoverflow.com/questions/3022376/…
这是唯一适用于我的 docx->pdf 转换的 java 开源库。我已经尝试过这 3 个库：apache POI、fr.opensagres（使用 apache POI 进行转换）和 docx4j。谢谢@Sathia。这是在 Windows 机器上测试的。
@brebDev，感谢您的赞赏。

【解决方案6】：

只有一个依赖需要手动添加（其他的应该自动拉入）。目前最新版本为2.0.2。

分级：

dependencies {
    // What you should already have
    implementation "org.apache.poi:poi-ooxml:latest.release"


    // ADD THE BELOW LINE TO DEPENDENCIES BLOCK IN build.gradle
    implementation 'fr.opensagres.xdocreport:fr.opensagres.poi.xwpf.converter.pdf:2.0.2'
}

仅限 Maven：

<dependency>
    <groupId>fr.opensagres.xdocreport</groupId>
    <artifactId>fr.opensagres.poi.xwpf.converter.pdf</artifactId>
    <version>2.0.2</version>
</dependency>

【讨论】：

【解决方案7】：

我将提供 3 种方法将 docx 转换为 pdf：

使用 itext 和 opensagres 和 apache poi

代码：

import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.InputStream;
import java.io.OutputStream;
import fr.opensagres.poi.xwpf.converter.pdf.PdfOptions;
import fr.opensagres.poi.xwpf.converter.pdf.PdfConverter;
import org.apache.poi.xwpf.usermodel.XWPFDocument;

public class ConvertDocToPdfitext {

  public static void main(String[] args) {
    System.out.println( "Starting conversion!!!" );
    ConvertDocToPdfitext cwoWord = new ConvertDocToPdfitext();
    cwoWord.ConvertToPDF("C:/Users/avijit.shaw/Desktop/testing/docx/Account Opening Prototype Details.docx", "C:/Users/avijit.shaw/Desktop/testing/docx/Test-1.pdf");
    System.out.println( "Ending conversion!!!" );
  }

  public void ConvertToPDF(String docPath, String pdfPath) {
    try {
        InputStream doc = new FileInputStream(new File(docPath));
        XWPFDocument document = new XWPFDocument(doc);
        PdfOptions options = PdfOptions.create();
        OutputStream out = new FileOutputStream(new File(pdfPath));
        PdfConverter.getInstance().convert(document, out, options);
    } catch (IOException ex) {
        System.out.println(ex.getMessage());
    }
  }
}

依赖关系：使用 Maven 解决依赖关系。

fr.opensagres.poi.xwpf.converter.core 的新版本 2.0.2 与 apache poi 4.0.1 和 itext 2.17 一起运行。您只需要在 Maven 中添加以下依赖项，然后 Maven 将自动下载所有依赖项。（更新了您的 Maven 项目，因此它下载了所有这些库及其所有依赖项）

<dependency>
    <groupId>fr.opensagres.xdocreport</groupId>
    <artifactId>fr.opensagres.poi.xwpf.converter.pdf</artifactId>
    <version>2.0.2</version>
</dependency>

使用 Documents4j

注意：您需要在运行此代码的机器上安装 MS Office。

代码：

import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.InputStream;
import java.io.OutputStream;

import com.documents4j.api.DocumentType;
import com.documents4j.api.IConverter;
import com.documents4j.job.LocalConverter;

public class Document4jApp {

  public static void main(String[] args) {

      File inputWord = new File("C:/Users/avijit.shaw/Desktop/testing/docx/Account Opening Prototype Details.docx");
      File outputFile = new File("Test_out.pdf");
      try  {
          InputStream docxInputStream = new FileInputStream(inputWord);
          OutputStream outputStream = new FileOutputStream(outputFile);
          IConverter converter = LocalConverter.builder().build();         
          converter.convert(docxInputStream).as(DocumentType.DOCX).to(outputStream).as(DocumentType.PDF).execute();
          outputStream.close();
          System.out.println("success");
      } catch (Exception e) {
          e.printStackTrace();
      }
  }
}

依赖关系：使用 Maven 解决依赖关系。

<dependency>
    <groupId>com.documents4j</groupId>
    <artifactId>documents4j-local</artifactId>
    <version>1.0.3</version>
</dependency>
<dependency>
    <groupId>com.documents4j</groupId>
    <artifactId>documents4j-transformer-msoffice-word</artifactId>
    <version>1.0.3</version>
</dependency>

使用 openoffice nuoil

注意：您需要在运行此代码的机器上安装 OpenOffice。 代码：


import java.io.File;
import com.sun.star.beans.PropertyValue;
import com.sun.star.comp.helper.BootstrapException;
import com.sun.star.frame.XComponentLoader;
import com.sun.star.frame.XDesktop;
import com.sun.star.frame.XStorable;
import com.sun.star.lang.XComponent;
import com.sun.star.lang.XMultiComponentFactory;
import com.sun.star.uno.Exception;
import com.sun.star.uno.UnoRuntime;
import com.sun.star.uno.XComponentContext;

import ooo.connector.BootstrapSocketConnector;

public class App {
  public static void main(String[] args) throws Exception, BootstrapException {
      System.out.println("Stating conversion!!!");
      // Initialise
      String oooExeFolder = "C:\\Program Files (x86)\\OpenOffice 4\\program"; //Provide path on which OpenOffice is installed
      XComponentContext xContext = BootstrapSocketConnector.bootstrap(oooExeFolder);
      XMultiComponentFactory xMCF = xContext.getServiceManager();
  
      Object oDesktop = xMCF.createInstanceWithContext("com.sun.star.frame.Desktop", xContext);
  
      XDesktop xDesktop = (XDesktop) UnoRuntime.queryInterface(XDesktop.class, oDesktop);
  
      // Load the Document
      String workingDir = "C:/Users/avijit.shaw/Desktop/testing/docx/"; //Provide directory path of docx file to be converted
      String myTemplate = workingDir + "Account Opening Prototype Details.docx"; // Name of docx file to be converted
  
      if (!new File(myTemplate).canRead()) {
        throw new RuntimeException("Cannot load template:" + new File(myTemplate));
      }
  
      XComponentLoader xCompLoader = (XComponentLoader) UnoRuntime
          .queryInterface(com.sun.star.frame.XComponentLoader.class, xDesktop);
  
      String sUrl = "file:///" + myTemplate;
  
      PropertyValue[] propertyValues = new PropertyValue[0];
  
      propertyValues = new PropertyValue[1];
      propertyValues[0] = new PropertyValue();
      propertyValues[0].Name = "Hidden";
  

          propertyValues[0].Value = new Boolean(true);
  
      XComponent xComp = xCompLoader.loadComponentFromURL(sUrl, "_blank", 0, propertyValues);
  
      // save as a PDF
      XStorable xStorable = (XStorable) UnoRuntime.queryInterface(XStorable.class, xComp);
  
      propertyValues = new PropertyValue[2];
      // Setting the flag for overwriting
      propertyValues[0] = new PropertyValue();
      propertyValues[0].Name = "Overwrite";
      propertyValues[0].Value = new Boolean(true);
      // Setting the filter name
      propertyValues[1] = new PropertyValue();
      propertyValues[1].Name = "FilterName";
      propertyValues[1].Value = "writer_pdf_Export";
  
      // Appending the favoured extension to the origin document name
      String myResult = workingDir + "letterOutput.pdf"; // Name of pdf file to be output
      xStorable.storeToURL("file:///" + myResult, propertyValues);
  
      System.out.println("Saved " + myResult);
  
      // shutdown
      xDesktop.terminate();
  }
}

依赖关系：使用 Maven 解决依赖关系。

<!-- https://mvnrepository.com/artifact/org.openoffice/unoil -->
    <dependency>
        <groupId>org.openoffice</groupId>
        <artifactId>unoil</artifactId>
        <version>3.2.1</version>
    </dependency>
    <!-- https://mvnrepository.com/artifact/org.openoffice/juh -->
    <dependency>
        <groupId>org.openoffice</groupId>
        <artifactId>juh</artifactId>
        <version>3.2.1</version>
    </dependency>
    <!-- https://mvnrepository.com/artifact/org.openoffice/bootstrap-connector -->
    <dependency>
        <groupId>org.openoffice</groupId>
        <artifactId>bootstrap-connector</artifactId>
        <version>0.1.1</version>
    </dependency>

【讨论】：

使用 Documents4j 时记得在execute(); 之后调用converter.shutDown();。
在 rtl 语言中，opensagres 没有合适的输出，但 Documents4j 没问题。

【解决方案8】：

如果您的文档非常丰富，并且您的选择是在 Linux/Unix 上进行转换，那么 all three main options suggested in the thread 实施起来可能“有点”痛苦。

我可能建议的一个解决方案是使用Gotenberg：一个由 Docker 驱动的无状态 API，用于将 HTML、Markdown 和 Office 文档转换为 PDF。

启动容器$ docker run --rm -p 3000:3000 thecodingmachine/gotenberg:6
向容器发出请求。以下是如何使用curl：

$ curl --request POST \
    --url http://localhost:3000/convert/office \
    --header 'Content-Type: multipart/form-data' \
    --form files=@document.docx \
    --form files=@document2.docx \
    -o result.pdf

将其部署到您的基础架构（例如，作为单独的微服务），然后从您的 Java 服务发出简单的 HTTP 请求。在响应中获取您的 PDF 文件，然后用它做您想做的事情。

经过测试，就像魅力一样！

【讨论】：

【解决方案9】：

Docx4j 是开源，是用于将 Docx 转换为 pdf 且没有任何对齐或字体问题的最佳 API。

Maven 依赖项：

<dependency>
    <groupId>org.docx4j</groupId>
    <artifactId>docx4j-JAXB-Internal</artifactId>
    <version>8.0.0</version>
</dependency>
<dependency>
    <groupId>org.docx4j</groupId>
    <artifactId>docx4j-JAXB-ReferenceImpl</artifactId>
    <version>8.0.0</version>
</dependency>
<dependency>
    <groupId>org.docx4j</groupId>
    <artifactId>docx4j-JAXB-MOXy</artifactId>
    <version>8.0.0</version>
</dependency>
<dependency>
    <groupId>org.docx4j</groupId>
    <artifactId>docx4j-export-fo</artifactId>
    <version>8.0.0</version>
</dependency>

代码：

import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.InputStream;

import org.docx4j.Docx4J;
import org.docx4j.openpackaging.packages.WordprocessingMLPackage;
import org.docx4j.openpackaging.parts.WordprocessingML.MainDocumentPart;

public class DocToPDF {

    public static void main(String[] args) {
        
        try {
            InputStream templateInputStream = new FileInputStream("D:\\\\Workspace\\\\New\\\\Sample.docx");
            WordprocessingMLPackage wordMLPackage = WordprocessingMLPackage.load(templateInputStream);
            MainDocumentPart documentPart = wordMLPackage.getMainDocumentPart();

            String outputfilepath = "D:\\\\Workspace\\\\New\\\\Sample.pdf";
            FileOutputStream os = new FileOutputStream(outputfilepath);
            Docx4J.toPDF(wordMLPackage,os);
            os.flush();
            os.close();
        } catch (Throwable e) {

            e.printStackTrace();
        } 
    }

}

【讨论】：

【解决方案10】：

我有一个非常复杂的文档，Microsoft Graph Api 帮助生成了正确的 pdf，其格式与输入 Docx 中的完全相同

以下链接有助于完成应用注册，以设置 API 密钥/秘密以访问 Graph Api https://medium.com/medialesson/convert-files-to-pdf-using-microsoft-graph-azure-functions-20bc84d2adc4

https://devzigma.com/java/upload-files-to-sharepoint-using-java/

docx 文档上传后，请查看此内容以便能够下载相同 docx 的 pdf 版本：https://docs.microsoft.com/en-us/graph/api/driveitem-get-content-format?view=graph-rest-1.0&tabs=java

【讨论】：