如何从 Java 漂亮地打印 XML？答案

【问题标题】：How to pretty print XML from Java?如何从 Java 漂亮地打印 XML？
【发布时间】：2010-09-13 10:28:15
【问题描述】：

我有一个包含 XML 的 Java 字符串，没有换行符或缩进。我想把它变成一个格式很好的 XML 的字符串。我该怎么做？

String unformattedXml = "<tag><nested>hello</nested></tag>";
String formattedXml = new [UnknownClass]().format(unformattedXml);

注意：我的输入是一个字符串。我的输出是一个字符串。

（基本）模拟结果：

<?xml version="1.0" encoding="UTF-8"?>
<root>
  <tag>
    <nested>hello</nested>
  </tag>
</root>

【问题讨论】：

检查这个问题：stackoverflow.com/questions/1264849/…
只是好奇，您是将此输出发送到 XML 文件还是缩进真正重要的其他文件？前段时间，我非常担心格式化我的 XML 以使其正确显示...但是在花了很多时间之后，我意识到我必须将输出发送到 Web 浏览器以及任何相对现代的 Web 浏览器实际上将以漂亮的树形结构显示 XML，所以我可以忘记这个问题并继续前进。我提到这一点是为了以防您（或其他有相同问题的用户）可能忽略了相同的细节。
@Abel，保存到文本文件，插入 HTML 文本区域，并转储到控制台以进行调试。
“搁置太宽泛” - 很难比目前的问题更准确！

标签： java xml pretty-print

【解决方案1】：

Underscore-java 有静态方法U.formatXml(string)。 Live example

import com.github.underscore.lodash.U;

public class MyClass {
    public static void main(String args[]) {
        String xml = "<tag><nested>hello</nested></tag>";

        System.out.println(U.formatXml("<?xml version=\"1.0\" encoding=\"UTF-8\"?><root>" + xml + "</root>"));
    }
}

输出：

<?xml version="1.0" encoding="UTF-8"?>
<root>
   <tag>
      <nested>hello</nested>
   </tag>
</root>

【讨论】：

这太棒了！

【解决方案2】：

试试这个：

 try
                    {
                        TransformerFactory transFactory = TransformerFactory.newInstance();
                        Transformer transformer = null;
                        transformer = transFactory.newTransformer();
                        StringWriter buffer = new StringWriter();
                        transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
                        transformer.transform(new DOMSource(element),
                                  new StreamResult(buffer)); 
                        String str = buffer.toString();
                        System.out.println("XML INSIDE IS #########################################"+str);
                        return element;
                    }
                    catch (TransformerConfigurationException e)
                    {
                        e.printStackTrace();
                    }
                    catch (TransformerException e)
                    {
                        e.printStackTrace();
                    }

【讨论】：

看不出与一些已经发布的答案的区别。

【解决方案3】：

在提出自己的解决方案之前，我应该先查找此页面！无论如何，我的使用 Java 递归来解析 xml 页面。此代码完全独立，不依赖第三方库。还有..它使用递归！

// you call this method passing in the xml text
public static void prettyPrint(String text){
    prettyPrint(text, 0);
}

// "index" corresponds to the number of levels of nesting and/or the number of tabs to print before printing the tag
public static void prettyPrint(String xmlText, int index){
    boolean foundTagStart = false;
    StringBuilder tagChars = new StringBuilder();
    String startTag = "";
    String endTag = "";
    String[] chars = xmlText.split("");
    // find the next start tag
    for(String ch : chars){
        if(ch.equalsIgnoreCase("<")){
            tagChars.append(ch);
            foundTagStart = true;
        } else if(ch.equalsIgnoreCase(">") && foundTagStart){
            startTag = tagChars.append(ch).toString();
            String tempTag = startTag;
            endTag = (tempTag.contains("\"") ? (tempTag.split(" ")[0] + ">") : tempTag).replace("<", "</"); // <startTag attr1=1 attr2=2> => </startTag>
            break;
        } else if(foundTagStart){
            tagChars.append(ch);
        }
    }
    // once start and end tag are calculated, print start tag, then content, then end tag
    if(foundTagStart){
        int startIndex = xmlText.indexOf(startTag);
        int endIndex = xmlText.indexOf(endTag);
        // handle if matching tags NOT found
        if((startIndex < 0) || (endIndex < 0)){
            if(startIndex < 0) {
                // no start tag found
                return;
            } else {
                // start tag found, no end tag found (handles single tags aka "<mytag/>" or "<?xml ...>")
                printTabs(index);
                System.out.println(startTag);
                // move on to the next tag
                // NOTE: "index" (not index+1) because next tag is on same level as this one
                prettyPrint(xmlText.substring(startIndex+startTag.length(), xmlText.length()), index);
                return;
            }
        // handle when matching tags found
        } else {
            String content = xmlText.substring(startIndex+startTag.length(), endIndex);
            boolean isTagContainsTags = content.contains("<"); // content contains tags
            printTabs(index);
            if(isTagContainsTags){ // ie: <tag1><tag2>stuff</tag2></tag1>
                System.out.println(startTag);
                prettyPrint(content, index+1); // "index+1" because "content" is nested
                printTabs(index);
            } else {
                System.out.print(startTag); // ie: <tag1>stuff</tag1> or <tag1></tag1>
                System.out.print(content);
            }
            System.out.println(endTag);
            int nextIndex = endIndex + endTag.length();
            if(xmlText.length() > nextIndex){ // if there are more tags on this level, continue
                prettyPrint(xmlText.substring(nextIndex, xmlText.length()), index);
            }
        }
    } else {
        System.out.print(xmlText);
    }
}

private static void printTabs(int counter){
    while(counter-- > 0){ 
        System.out.print("\t");
    }
}

【讨论】：

Underscore-java, U.formatXml(xml) 也不依赖第三方库。

【解决方案4】：

我试图实现类似的目标，但没有任何外部依赖。应用程序已经在使用 DOM 来格式化只是为了记录 XML！

这是我的示例 sn-p

public void formatXML(final String unformattedXML) {
    final int length = unformattedXML.length();
    final int indentSpace = 3;
    final StringBuilder newString = new StringBuilder(length + length / 10);
    final char space = ' ';
    int i = 0;
    int indentCount = 0;
    char currentChar = unformattedXML.charAt(i++);
    char previousChar = currentChar;
    boolean nodeStarted = true;
    newString.append(currentChar);
    for (; i < length - 1;) {
        currentChar = unformattedXML.charAt(i++);
        if(((int) currentChar < 33) && !nodeStarted) {
            continue;
        }
        switch (currentChar) {
        case '<':
            if ('>' == previousChar && '/' != unformattedXML.charAt(i - 1) && '/' != unformattedXML.charAt(i) && '!' != unformattedXML.charAt(i)) {
                indentCount++;
            }
            newString.append(System.lineSeparator());
            for (int j = indentCount * indentSpace; j > 0; j--) {
                newString.append(space);
            }
            newString.append(currentChar);
            nodeStarted = true;
            break;
        case '>':
            newString.append(currentChar);
            nodeStarted = false;
            break;
        case '/':
            if ('<' == previousChar || '>' == unformattedXML.charAt(i)) {
                indentCount--;
            }
            newString.append(currentChar);
            break;
        default:
            newString.append(currentChar);
        }
        previousChar = currentChar;
    }
    newString.append(unformattedXML.charAt(length - 1));
    System.out.println(newString.toString());
}

【讨论】：

删除文本中的空格。示例：\n some example lol\n after transform:someexamplelol
是的，它还有其他缺陷，例如处理 cmets、DTD（如果有的话）等。但是，对此进行更正我能够得到一个可以接受的（除了像这样的复杂元素一些复杂的 text again 然后什么都没有）逻辑工作。现在手头没有代码，有空再写吧
请说明此解决方案如何改进现有答案，否则只会增加噪音。