【问题标题】:Converting rtf to html with format in Java在Java中使用格式将rtf转换为html
【发布时间】:2014-04-18 16:30:13
【问题描述】:

我可以使用 JEditorPane 来解析 rtf 文本并将其转换为 html。但是 html 输出缺少某种格式,即本例中的删除线标记。正如您在输出中看到的那样,下划线文本正确地包裹在 中,但没有删除线包裹。有什么想法吗?

public void testRtfToHtml()
{
    JEditorPane pane = new JEditorPane();
    pane.setContentType("text/rtf");

    StyledEditorKit kitRtf = (StyledEditorKit) pane.getEditorKitForContentType("text/rtf");

    try
    {
        kitRtf.read(
            new StringReader(
                "{\\rtf1\\ansi \\deflang1033\\deff0{\\fonttbl {\\f0\\froman \\fcharset0 \\fprq2 Times New Roman;}}{\\colortbl;\\red0\\green0\\blue0;} {\\stylesheet{\\fs20 \\snext0 Normal;}} {\\plain \\fs26 \\strike\\fs26 This is supposed to be strike-through.}{\\plain \\fs26 \\fs26  } {\\plain \\fs26 \\ul\\fs26 Underline text here} {\\plain \\fs26 \\fs26 .{\\u698\\'20}}"),
            pane.getDocument(), 0);
        kitRtf = null;

        StyledEditorKit kitHtml =
            (StyledEditorKit) pane.getEditorKitForContentType("text/html");

        Writer writer = new StringWriter();
        kitHtml.write(writer, pane.getDocument(), 0, pane.getDocument().getLength());
        System.out.println(writer.toString());
    }
    catch (Exception e)
    {
        e.printStackTrace();
    }
}

输出:

<html>
  <head>
    <style>
      <!--
        p.Normal {
          RightIndent:0.0;
          FirstLineIndent:0.0;
          LeftIndent:0.0;
        }
      -->
    </style>
  </head>
  <body>
    <p class=default>
              <span style="color: #000000; font-size: 13pt; font-family: Times New Roman">
This is supposed to be strike-through.
      </span>
      <span style="color: #000000; font-size: 13pt; font-family: Times New Roman">

      </span>
       <span style="color: #000000; font-size: 13pt; font-family: Times New Roman">
<u>Underline text here</u>
      </span>
       <span style="color: #000000; font-size: 13pt; font-family: Times New Roman">
.?
      </span>

    </p>
  </body>
</html>

【问题讨论】:

    标签: java html rtf jeditorpane strikethrough


    【解决方案1】:

    您可以尝试使用 this converter library 使用 OpenOffice 或 LibreOffice 进行转换,如 in this blog post 所述

    【讨论】:

      【解决方案2】:

      这是我用来将 RTF 从 .msg 正文转换为 HTML 的函数。 在 GitHub 上查看我的 Outlook 消息解析器 yamp 存储库。

      public static String rtfToHtml(String rtfText) {
          if (rtfText != null) {
              rtfText = rtfText.replaceAll("\\{\\\\\\*\\\\[m]?htmltag[\\d]*(.*)}", "$1")
                  .replaceAll("\\\\htmlrtf[1]?(.*)\\\\htmlrtf0", "")
                  .replaceAll("\\\\htmlrtf[01]?", "")
                  .replaceAll("\\\\htmlbase", "")
                  .replaceAll("\\\\par", "\n")
                  .replaceAll("\\\\tab", "\t")
                  .replaceAll("\\\\line", "\n")
                  .replaceAll("\\\\page", "\n\n")
                  .replaceAll("\\\\sect", "\n\n")
                  .replaceAll("\\\\emdash", "&#2014;")
                  .replaceAll("\\\\endash", "&#2013;")
                  .replaceAll("\\\\emspace", "&#2003;")
                  .replaceAll("\\\\enspace", "&#2002;")
                  .replaceAll("\\\\qmspace", "&#2005;")
                  .replaceAll("\\\\bullet", "&#2022;")
                  .replaceAll("\\\\lquote", "&#2018;")
                  .replaceAll("\\\\rquote", "&#2019;")
                  .replaceAll("\\\\ldblquote", "&#201C;")
                  .replaceAll("\\\\rdblquote", "&#201D;")
                  .replaceAll("\\\\row", "\n")
                  .replaceAll("\\\\cell", "|")
                  .replaceAll("\\\\nestcell", "|")
                  .replaceAll("([^\\\\])\\{", "$1")
                  .replaceAll("([^\\\\])}", "$1")
                  .replaceAll("[\\\\](\\{)", "$1")
                  .replaceAll("[\\\\](})", "$1")
                  .replaceAll("\\\\u([0-9]{2,5})", "&#$1;")
                  .replaceAll("\\\\'([0-9A-Fa-f]{2})", "&#x$1;")
                  .replaceAll("\"cid:(.*)@.*\"", "\"$1\"");
      
              int index = rtfText.indexOf("<html");
              if (index != -1) {
                  return rtfText.substring(index);
              }
          }
      
          return null;
      }
      

      【讨论】:

        【解决方案3】:

        由于一些错误,我像这样修改你的功能:

        public static String rtfToHtml(String rtfText) {
            StringBuilder sb = new StringBuilder();
            
            if (rtfText != null) {
                String[] lignes = rtfText.split("[\\r\\n]+");
                for (String ligne : lignes) {
                    String tempLine = ligne
                        .replaceAll("\\{\\\\\\*\\\\[m]?htmltag[\\d]*([^}]*)\\}", "$1")
                        .replaceAll("\\\\htmlrtf0([^\\\\]*)\\\\htmlrtf", "$1")
                        .replaceAll("\\\\htmlrtf \\{(.*)\\}\\\\htmlrtf0", "$1")
                        .replaceAll("\\\\htmlrtf (.*)\\\\htmlrtf0", "")
                        .replaceAll("\\\\htmlrtf[0]?", "")
                        .replaceAll("\\\\field\\{\\\\\\*\\\\fldinst\\{[^}]*\\}\\}", "")
                        .replaceAll("\\{\\\\fldrslt\\\\cf1\\\\ul([^}]*)\\}", "$1")
                        .replaceAll("\\\\htmlbase", "")
                        .replaceAll("\\\\par", "\n")
                        .replaceAll("\\\\tab", "\t")
                        .replaceAll("\\\\line", "\n")
                        .replaceAll("\\\\page", "\n\n")
                        .replaceAll("\\\\sect", "\n\n")
                        .replaceAll("\\\\emdash", "&#2014;")
                        .replaceAll("\\\\endash", "&#2013;")
                        .replaceAll("\\\\emspace", "&#2003;")
                        .replaceAll("\\\\enspace", "&#2002;")
                        .replaceAll("\\\\qmspace", "&#2005;")
                        .replaceAll("\\\\bullet", "&#2022;")
                        .replaceAll("\\\\lquote", "&#2018;")
                        .replaceAll("\\\\rquote", "&#2019;")
                        .replaceAll("\\\\ldblquote", "&#201C;")
                        .replaceAll("\\\\rdblquote", "&#201D;")
                        .replaceAll("\\\\row", "\n")
                        .replaceAll("\\\\cell", "|")
                        .replaceAll("\\\\nestcell", "|")
                        .replaceAll("([^\\\\])\\{", "$1")
                        .replaceAll("([^\\\\])}", "$1")
                        .replaceAll("[\\\\](\\{)", "$1")
                        .replaceAll("[\\\\](})", "$1")
                        .replaceAll("\\\\u([0-9]{2,5})", "&#$1;")
                        .replaceAll("\\\\'([0-9A-Fa-f]{2})", "&#x$1;")
                        .replaceAll("\"cid:(.*)@.*\"", "\"$1\"")
                        .replaceAll(" {2,}", " ")
                    ;
                    
                    if (!tempLine.replaceAll("\\s+", "").isEmpty()) {
                        sb.append(tempLine).append("\r\n");
                    }
                }
                
                rtfText = sb.toString();
        
                int index = rtfText.indexOf("<html");
                if (index != -1) {
                    return rtfText.substring(index);
                }
            }
        
            return null;
        }
        

        【讨论】:

          猜你喜欢
          • 1970-01-01
          • 2014-11-15
          • 2015-11-20
          • 1970-01-01
          • 2014-03-11
          • 1970-01-01
          • 2012-02-17
          • 1970-01-01
          • 1970-01-01
          相关资源
          最近更新 更多