【问题标题】:Converting emoji to HTML Decimal Code or Unicode Hexadecimal Code in java在java中将表情符号转换为HTML十进制代码或Unicode十六进制代码
【发布时间】:2017-06-28 00:42:59
【问题描述】:

我正在尝试将带有表情符号内容的文本文件转换为带有表情符号的 html 代码或使用 Java 的十六进制代码的文件。 示例:

我/p:<div id="thread" style="white-space: pre-wrap;"><div>????????????????????⚽️????

预期 o/p :<div id="thread" style="white-space: pre-wrap;"><div>😀😀😃🍎🍏⚽️🏀

上面输出的'????'应该改成对应的html实体码'& # 128512;'

这里给出了 Html 实体代码和十六进制代码的详细信息: http://character-code.com/emoticons-html-codes.php

我尝试过的示例代码如下:

try {
            File file = new File("/inFile.txt");
            str = FileUtils.readFileToString(file, "ISO-8859-1");
            System.out.println(new String(str.getBytes(), "UTF-8"));
            String results = StringEscapeUtils.escapeHtml4(str);
            System.out.println(results);
        } catch (IOException e) {
            e.printStackTrace();
        }

【问题讨论】:

  • 所以你有代码要做某事,你不给我们看代码,然后问为什么代码不起作用? 真的吗?!?!?
  • 添加了我尝试过的示例代码。
  • 您确定文件使用ISO-8859-1 编码吗?这似乎......不太可能。
  • 我不确定..我们正在获取编码为“UTF-8”的 xml 文件。这些表情符号字符是 CDATA 的一部分。我只是想用相应的 HTML 进行解码和转换表情符号的实体代码。
  • 所以你的问题似乎集中在StringEscapeUtils.escapeHtml4(),而你的抱怨是它没有正确映射表情符号。 1)我假设那来自 Apache Commons Lang? 2)您使用的是哪个版本的库? 3) 你为什么要这样做,而不是仅仅用 UTF-8 编写 HTML?

标签: java html html-entities emoji html-encode


【解决方案1】:
I got the work around :
public static void htmlDecimalCodeGenerator () {

  DocumentBuilderFactory domFactory = DocumentBuilderFactory.newInstance();

  domFactory.setValidating(false);

   // File inputFile = new File("/inputFile.xml");
   File inputFile = new File("/inputFile.xml");



   try {

  FileOutputStream fop = null;

  File OutFile = new File("/outputFile.xml");

  fop = new FileOutputStream(OutFile);



  DocumentBuilder builder = domFactory.newDocumentBuilder();

  Document doc = builder.parse(inputFile);



  TransformerFactory tf = TransformerFactory.newInstance();

  Transformer transformer = tf.newTransformer();



   /*
  no value of OMIT_XML_DECLARATION will add following xml declaration in the beginning of the file.
  <?xml version='1.0' encoding='UTF-32'?>
  */
   transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");



   /*

  When the output method is "xml", the version value specifies the
  version of XML to be used for outputting the result tree. The default
  value for the xml output method is 1.0. When the output method is
  "html", the version value indicates the version of the HTML.
  The default value for the xml output method is 4.0, which specifies
  that the result should be output as HTML conforming to the HTML 4.0
  Recommendation [HTML]. If the output method is "text", the version
  property is ignored
  */
   transformer.setOutputProperty(OutputKeys.METHOD, "xml");



   /*
  Indent-- specifies whether the Transformer may
  add additional whitespace when outputting the result tree; the value
  must be yes or no.
  */
   transformer.setOutputProperty(OutputKeys.INDENT, "no");





  transformer.setOutputProperty(OutputKeys.ENCODING, "ISO-8859-1");

   // transformer.setOutputProperty("{http://xml.apache.org/xslt}indent-amount", "4");

   transformer.transform(new DOMSource(doc),

   new StreamResult(new OutputStreamWriter(System.out, "UTF-8")));

   // new StreamResult(new OutputStreamWriter(fop, "UTF-8")));


   } catch (Exception e) {

  e.printStackTrace();

  }

}

}

【讨论】:

    猜你喜欢
    • 2017-06-23
    • 2017-03-27
    • 2018-10-13
    • 2014-01-15
    • 2019-01-01
    • 1970-01-01
    • 2020-05-27
    • 1970-01-01
    • 2014-12-31
    相关资源
    最近更新 更多