【问题标题】:While converting doc to html graphics or shapes are not converting into html format将 doc 转换为 html 图形或形状时未转换为 html 格式
【发布时间】:2018-07-16 10:21:51
【问题描述】:

我们想在浏览器的对话框中显示 doc 文件。这就是我将其转换为html文件的原因。因此 doc 文件成功转换为 html,但如果 doc 文件有图形或任何形状,则它会转换为 html 文件。但是图形软件没有转换成任何 html 标签,如 img 等,也没有显示在 UI 上显示的文件中,

那么我们如何将具有图形或形状的doc文件转换为html。

InputStream input = new FileInputStream (baseDir + fileName);
HWPFDocument wordDocument = new HWPFDocument (input);
wordToHtmlConverter.processDocument (wordDocument);
wordToHtmlConverter.setPicturesManager (picmang=new PicturesManager() {
        public String savePicture (byte[] content, PictureType pictureType, String suggestedName, float widthInches, float heightInches) {
            return suggestedName;
        }

    });
org.w3c.dom.Document htmlDocument = wordToHtmlConverter.getDocument();

    ByteArrayOutputStream outStream = new ByteArrayOutputStream();
    DOMSource domSource = new DOMSource (htmlDocument);
    StreamResult streamResult = new StreamResult (outStream);

    TransformerFactory tf = TransformerFactory.newInstance();
    Transformer serializer = tf.newTransformer();
    serializer.setOutputProperty (OutputKeys.ENCODING, "UTF-8");
    serializer.setOutputProperty (OutputKeys.INDENT, "yes");
    serializer.setOutputProperty (OutputKeys.METHOD, "html");
    serializer.transform (domSource, streamResult);
    outStream.close();

    String content = new String (outStream.toByteArray() );
    FileOutputStream fos = null;
    String destinationHTMLFile = baseDir + fileName.replace(".docx", "").replace(".doc", "")+".html";
    BufferedWriter bw = null;

    File file = new File(destinationHTMLFile);
    fos = new FileOutputStream(file);
    bw = new BufferedWriter(new OutputStreamWriter(fos, "UTF-8"));
    bw.write(content);

所以请帮我在浏览器中显示 doc 文件。

【问题讨论】:

    标签: html apache-poi doc


    【解决方案1】:

    AbstractWordConverter.setPicturesManager 必须在AbstractWordConverter.processDocument 之前完成。当然Interface PicturesManager中的PicturesManager.savePicture方法需要在实现该接口的类中填充保存图片的功能。

    以下示例从我的主目录中获取 WordDocument.doc,并将其转换为包含图片的 HTML,并将生成的文件(HTML 文件和图像文件)放入新创建的目录 html。请注意,WordDocument.doc 中包含的图片必须为*.gif*.png*.jpg,因为Writing/Saving an Image 使用的方法仅支持这些类型。

    import org.apache.poi.hwpf.converter.WordToHtmlConverter;
    import org.apache.poi.hwpf.converter.PicturesManager;
    
    import org.apache.poi.hwpf.HWPFDocument;
    import org.apache.poi.hwpf.usermodel.PictureType;
    import org.apache.poi.util.XMLHelper;
    import org.w3c.dom.Document;
    
    import javax.xml.transform.OutputKeys;
    import javax.xml.transform.Transformer;
    import javax.xml.transform.TransformerFactory;
    import javax.xml.transform.dom.DOMSource;
    import javax.xml.transform.stream.StreamResult;
    
    import java.io.StringWriter;
    import java.io.FileInputStream;
    import java.io.ByteArrayInputStream;
    import java.io.File;
    
    import java.awt.image.BufferedImage;
    import javax.imageio.ImageIO;
    
    public class TestWordToHtmlConverter {
    
     private static void convertDocToHTML(String docFilePathAndName, String htmlPath, String htmlFileName) throws Exception {
    
      new File(htmlPath).mkdir();
    
      HWPFDocument hwpfDocument = new HWPFDocument(new FileInputStream(docFilePathAndName));
    
      Document newDocument = XMLHelper.getDocumentBuilderFactory().newDocumentBuilder().newDocument();
      WordToHtmlConverter wordToHtmlConverter = new WordToHtmlConverter(newDocument);
    
      wordToHtmlConverter.setPicturesManager(
       new PicturesManager() {
        public String savePicture(byte[] content, PictureType pictureType, String suggestedName, float widthInches, float heightInches) {
         /*
         System.out.println(content);
         System.out.println(pictureType);
         System.out.println(suggestedName);
         System.out.println(widthInches);
         System.out.println(heightInches);
         */
         try {
          BufferedImage image = ImageIO.read(new ByteArrayInputStream(content));
          ImageIO.write(image, pictureType.getExtension(), new File(htmlPath, suggestedName));
         } catch (Exception e) {
          e.printStackTrace();
         }
         return suggestedName;
        }
       }
      );
    
      wordToHtmlConverter.processDocument(hwpfDocument);
    
      Transformer transformer = TransformerFactory.newInstance().newTransformer();
      transformer.setOutputProperty(OutputKeys.INDENT, "yes");
      transformer.setOutputProperty(OutputKeys.ENCODING, "utf-8");
      transformer.setOutputProperty(OutputKeys.METHOD, "html");
      transformer.transform(new DOMSource(wordToHtmlConverter.getDocument()),
                            new StreamResult(new File(htmlPath, htmlFileName)));
    
     }
    
     public static void main(String[] args) throws Exception {
    
      convertDocToHTML("/home/axel/Dokumente/WordDocument.doc", "/home/axel/Dokumente/html", "WordDocument.html");
    
     }
    
    }
    

    【讨论】:

    • 谢谢,但是从 doc 文件生成的 html 文件格式不正确,某些内容很混乱。我希望输出与 doc 文件相同。doc 中的所有内容和格式都应该相同。
    • @shriyash Lakhe:对我来说它有效。我已经使用不同的*.doc 文件进行了测试。您能否提供一个无效的示例*.doc 文件?
    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2018-12-07
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多