如何使用 Java 标准 DOM API 解析 DocumentFragment答案

【问题标题】：How to parse a DocumentFragment with with the Java standard DOM API如何使用 Java 标准 DOM API 解析 DocumentFragment
【发布时间】：2011-10-24 23:24:56
【问题描述】：

这就是我在 Java 中解析格式良好的 XML 文档的方法：

DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();

// text contains the XML content
Document doc = builder.parse(new InputSource(new StringReader(text)));

文本示例如下：

<a>
  <b/>
</a>

如何解析 DocumentFragment？例如，这个：

<a>
  <b/>
</a>
<a>
  <b/>
</a>

注意：如果可能的话，我想使用 org.w3c.dom 并且不使用其他库/技术。

【问题讨论】：

标签： java dom xml-parsing documentfragment well-formed

【解决方案1】：

我只是想到了一个愚蠢的解决方案。我可以像这样将片段包装在一个虚拟元素中：

<dummy><a>
  <b/>
</a>
<a>
  <b/>
</a></dummy>

然后以编程方式再次过滤掉该虚拟元素，如下所示：

String wrapped = "<dummy>" + text + "</dummy>";
Document parsed = builder.parse(new InputSource(new StringReader(wrapped)));
DocumentFragment fragment = parsed.createDocumentFragment();

// Here, the document element is the <dummy/> element.
NodeList children = parsed.getDocumentElement().getChildNodes();

// Move dummy's children over to the document fragment
while (children.getLength() > 0) {
    fragment.appendChild(children.item(0));
}

但这有点蹩脚，让我们看看有没有其他解决方案。

【讨论】：

正是我的建议 - 你打败了我。
其他平台的 XML 解析器支持 DocumentFragment，因此您无需添加 hack
@Phlip：那些“其他平台”是什么？当我问/回答这个问题时，它们对我有什么帮助？
Gnome 的 libxml2（Python 和 Ruby 使用）允许片段。但我承认我不是在帮助你，而是试图帮助社区......
@Phlip：这是一个关于 “Java 标准 DOM API” 的 Java 特定问题，所以我不相信这对社区有帮助...

【解决方案2】：

我建议不要使用 DOM API。它又慢又丑。

改用流式 StAX。它内置在 JDK 1.6+ 中。您一次可以获取一个元素，如果您缺少一个根元素，它也不会阻塞。

http://en.wikipedia.org/wiki/StAX

http://download.oracle.com/javase/6/docs/api/javax/xml/stream/XMLStreamReader.html

【讨论】：

谢谢。我别无选择，只能使用 DOM，因为我正在开发一个大型遗留系统。一般来说，它既不慢也不丑，IMO……除非你能用基准向我证明慢？
我认为慢是一个相对术语。 DOM 适用于较小的文档。对于大的，它会消耗太多的内存，这就是减慢速度的原因。
@ccleve 一个使用 StAX（Java 1.7，Xerces 作为实现）的最小示例将表明，如果 xml 格式不正确（缺少根元素），它将窒息而死）。使用<herpTag/><derpTag/> 将导致XMLStreamException 声明“文档中根元素之后的标记必须格式正确”。我的意图是使用 StAX 组装一个DocumentFragment 对象。您有以这种方式使用 StAX 的示例吗？创建DocumentFragments 而不必实现解析器或将东西包装在虚拟标签中会很好。

【解决方案3】：

进一步扩展已经给出的答案：

public static DocumentFragment stringToFragment(Document document, String source) throws Exception
{
    source = "<dummy>" + source + "</dummy>";
    Node node = stringToDom(source).getDocumentElement();
    node = document.importNode(node, true);
    DocumentFragment fragment = document.createDocumentFragment();
    NodeList children = node.getChildNodes();
    while (children.getLength() > 0)
    {
        fragment.appendChild(children.item(0));
    }
    return fragment;
}

【讨论】：

您现在只需要一个 stringToDom()。
我认为stackoverflow.com/a/1509229/16673 的答案显示了如何实现这一点