使用 SAX Parser 获取特定的子节点答案

【问题标题】：Fetch a particular child node using SAX Parser使用 SAX Parser 获取特定的子节点
【发布时间】：2020-03-08 02:02:14
【问题描述】：

强调文本我有以下 xml：

<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0">
<channel>
    <title>Game Analysis</title>
    <item>
        <title>Game</title>
        <description>ABC</description>
        <releaseDate>Sat, 21 Feb 2012 05:18:23 GMT</releaseDate>       
    </item>
    <item>
        <title>CoD</title>
        <description>XYZ</description>
        <releaseDate>Sat, 21 Feb 2011 05:18:23 GMT</releaseDate>            
    </item>
</channel>
</rss>

我必须解析这个 xml 并获取“item”下的所有 childNode，然后检查它是否包含“releaseDate”节点。如果不是，那么我必须抛出一个异常。

我也尝试过使用 xpath，但它不起作用。

    XPathFactory xPathfactory = XPathFactory.newInstance();
    XPath xpath = xPathfactory.newXPath();
    XPathExpression expr = xpath.compile("//channel/item");

    Object result = expr.evaluate(document, XPathConstants.NODESET);
    NodeList nodes = (NodeList) result;
    for (int i = 0; i < nodes.getLength(); i++) {
        System.out.println(nodes.item(i).getChildNodes());
    }

【问题讨论】：

这里似乎有些混乱。 SAX 解析器不创建节点树；它们为应用程序提供一系列事件。您不能直接将 XPath 与 SAX 一起使用。您可以使用 SAX 解析器为使用 DOM、JDOM2 或 XOM 的树构建器提供输入，然后在生成的树上使用 XPath。
这听起来像是一个学生练习，学生练习通常要求您不仅要解决问题，还要使用一组特定的技术来解决问题。如果是这种情况，那么您需要清楚地告诉我们您对解决方案施加了哪些限制。

标签： java xml xml-parsing sax saxparser

【解决方案1】：

试试这个代码。不要忘记在您的项目中包含 SAX 解析器库并从 XML 文档中删除 rss-string（希望这被接受）。

public class SaxParserTest {
    public static void main(String... argv) {
        SAXParserFactory saxParserFactory = SAXParserFactory.newInstance();
        try {
            SAXParser saxParser = saxParserFactory.newSAXParser();
            MyHandler handler = new MyHandler();
            saxParser.parse(new File("your path to XML-file here"), handler);
            List<Item> items = handler.getChannel().getItems();
            // your check of item release dates here
        } catch (Exception e) {
            e.printStackTrace();
        }
    }
}

class MyHandler extends DefaultHandler {
    private StringBuilder data = new StringBuilder();

    private Channel channel;

    private String itemTitle;
    private String itemDescription;
    private String itemReleaseDate;

    private boolean isItem;

    @Override
    public void startElement(String uri, String localName, String qName, Attributes attributes) throws SAXException {
        if (!qName.equals("rss")) {
            if (qName.equalsIgnoreCase("channel")) {
                channel = new Channel();
            } else if (qName.equalsIgnoreCase("item")) {
                isItem = true;
            }
            data.setLength(0);
        }
    }

    @Override
    public void endElement(String uri, String localName, String qName) throws SAXException {
        if (qName.equalsIgnoreCase("title")) {
            if (!isItem) {
                channel.setTitle(data.toString());
            } else {
                itemTitle = data.toString();
            }
        } else if (qName.equalsIgnoreCase("item")) {
            channel.addItem(new Item(itemTitle, itemDescription, itemReleaseDate));
            itemTitle = null;
            itemDescription = null;
            itemReleaseDate = null;
            isItem = false;
        } else if (qName.equalsIgnoreCase("description")) {
            itemDescription = data.toString();
        } else if (qName.equalsIgnoreCase("releaseDate")) {
            itemReleaseDate = data.toString();
        }
    }

    @Override
    public void characters(char ch[], int start, int length) throws SAXException {
        data.append(new String(ch, start, length));
    }

    public Channel getChannel() {
        return channel;
    }
}

class Channel {
    private String title;
    private List<Item> items;

    public String getTitle() {
        return title;
    }

    public void setTitle(String title) {
        this.title = title;
    }

    public List<Item> getItems() {
        return items;
    }

    public void setItems(List<Item> items) {
        this.items = items;
    }

    public void addItem(Item item) {
        if (items == null) {
            items = new ArrayList<Item>();
        }
        items.add(item);
    }
}

class Item {
    private String title;
    private String description;
    private String releaseDate;

    public Item(String title, String description, String releaseDate) {
        this.title = title;
        this.description = description;
        this.releaseDate = releaseDate;
    }
    public String getReleaseDate() {
        return releaseDate;
    }
}

【讨论】：

【解决方案2】：

XPath 应该可以正常工作，甚至可以用来创建更短的解决方案。表达式//channel/item[not(releaseDate)] 将返回所有不有releaseDate 子节点的item 节点。所以这段代码应该会给你答案：

    DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
    dbf.setNamespaceAware(true);

    Document document = dbf
            .newDocumentBuilder()
            .parse(...);

    XPath xpath = XPathFactory
            .newInstance()
            .newXPath();

    NodeList list = (NodeList) xpath.evaluate("//channel/item[not(releaseDate)]", document, XPathConstants.NODESET);
    if (list.getLength() != 0) {
        throw new Exception("Found <item> without <releaseDate>");
    }

【讨论】：

是的，但这不使用 SAX。