Groovy：解析带有 HTML 标签的 xml答案

【问题标题】：Groovy : parsing xml with HTML tags insideGroovy：解析带有 HTML 标签的 xml
【发布时间】：2014-08-08 19:25:36
【问题描述】：

我的问题是关于解析其中字符串值包含 HTML 标记的 XML：

def xmlString = '''
<resource>
   <string name="my_test">No problem here!</string>
   <string name="my_text">
<b> <big>My bold and big title</big></b>
   Rest of the text
  </string>
</resource>
'''

（这是一个Android资源文件）

当我使用 XmlSlurper 时，会删除 HTML 标记。这段代码：

def resources = new XmlSlurper().parseText(xmlString )
resources.string.each { string ->
    println "string name = " + string.@name + ", string value = " + string.text()
}

会产生

string name = my_test, string value = No problem here!
string name = my_text, string value = My bold and big title
   Rest of the text

我可以使用 CDATA 来阻止解析 HTML 标记，但是当使用字符串 my_text 时，这些 HTML 标记将不会被处理。

我还尝试使用 StreamingMarkupBuilder，如此 SO 答案中所述：How to extract HTML Code from a XML File using groovy，但随后仅显示 HTML 标记和它们之间的文本：

<b><big>My bold and big title</big></b>

并且不显示第一个字符串。提前致谢！

【问题讨论】：

有什么帮助吗？ stackoverflow.com/a/25140607/6509
Thx @tim_yates，这也可以，但我被节点困住了，以及如何从中获取我需要的信息。现在有一个解决方案，希望也能帮助面临同样问题的其他人。

标签： html xml parsing groovy

【解决方案1】：

def xmlString = '''
<resource>
    <string name="my_test">No problem here!</string>
    <string name="my_text">
        <b><big>My bold and big title</big></b>
        Rest of the text
    </string>
</resource>
'''

def result = []
def resources = new XmlSlurper().parseText(xmlString).string

resources.each { resource ->
    result << new groovy.xml.StreamingMarkupBuilder().bind { mkp.yield resource.getBody() }
}

【讨论】：