如何分别从每个元素中获取所有属性？答案

【问题标题】：How to get all attributes from each element separately?如何分别从每个元素中获取所有属性？
【发布时间】：2013-10-04 04:50:19
【问题描述】：

这是一些基本的 xml 文档：

<h1>My Heading</h1>

<p align = "center"> My paragraph
<img src="smiley.gif" alt="Smiley face" height="42" width="42"></img>
<img src="sad.gif" alt="Sad face" height="45" width="45"></img>
<img src="funny.gif" alt="Funny face" height="48" width="48"></img>
</p>
<p>My para</p>

我想要做的是找到元素，他的所有属性并保存每个元素的属性名称+属性值。到目前为止，这是我的代码：

private Map <String, String> tag = new HashMap <String,String> ();

public Map <String, String> findElement () {
    try {
        FileReader fRead = new FileReader (sourcePage);
        BufferedReader bRead = new BufferedReader (fRead);

        DocumentBuilderFactory docBuilderFactory = DocumentBuilderFactory.newInstance ();
        DocumentBuilder docBuilder = docBuilderFactory.newDocumentBuilder ();
        Document doc = docBuilder.parse(new FileInputStream (new File (sourcePage)));
        XPathFactory xFactory = XPathFactory.newInstance ();
        XPath xPath = xFactory.newXPath ();
        NodeList nl = (NodeList) xPath.evaluate("//img/@*", doc, XPathConstants.NODESET); 
        for( int i=0; i<nl.getLength (); i++) {
            Attr attr = (Attr) nl.item(i);
            String name = attr.getName();
            String value = attr.getValue();
            tag.put (name,value); 
        }
        bRead.close ();
        fRead.close ();
    }
    catch (Exception e) {
        e.printStackTrace();
        System.err.println ("An error has occured.");           
    }

当我在寻找 img 的属性时出现问题，因为属性相同。 HashMap 不适合这种情况，因为它会用相同的键覆盖值。也许我使用错误的表达式来查找所有属性。有没有其他方法，如何获取第 n 个 img 元素的属性名称和值？

【问题讨论】：

解决方案取决于您想要实现的目标。如果您只需要第一个<img> 元素的属性，可以使用这个XPath 表达式：//descendant::img[1]/@*。如果您想要所有元素及其所有标签，则需要某种 Multimap 并逐个扫描所有元素。您可以从该方法返回 List<Map<String, String>>。两者之间还有一些其他可能的变化。您的预期输出是什么？

标签： java html xml xpath

【解决方案1】：

首先，让我们稍微平整一下场地。我稍微清理了您的代码以获得编译起点。我删除了不必要的代码，并通过我最好的猜测修复了它应该做什么。我对其进行了一点概括，使其接受一个tagName 参数。它仍然是相同的代码并犯了同样的错误，但现在它可以编译了（为了方便起见，使用 Java 7 功能，如果需要，将其切换回 Java 6）。为了它，我还将try-catch 拆分为多个块：

public Map<String, String> getElementAttributesByTagName(String tagName) {
    Document document;
    try (InputStream input = new FileInputStream(sourcePage)) {
        DocumentBuilderFactory docBuilderFactory = DocumentBuilderFactory.newInstance();
        DocumentBuilder docBuilder = docBuilderFactory.newDocumentBuilder();
        document = docBuilder.parse(input);
    } catch (IOException | ParserConfigurationException | SAXException e) {
        throw new RuntimeException(e);
    }

    NodeList attributeList;
    try {
        XPath xPath = XPathFactory.newInstance().newXPath();
        attributeList = (NodeList)xPath.evaluate("//descendant::" + tagName + "[1]/@*", document, XPathConstants.NODESET);
    } catch (XPathExpressionException e) {
        throw new RuntimeException(e);
    }

    Map<String, String> tagInfo = new HashMap<>();
    for (int i = 0; i < attributeList.getLength(); i++) {
        Attr attribute = (Attr)attributeList.item(i);
        tagInfo.put(attribute.getName(), attribute.getValue());
    }
    return tagInfo;
}

当针对上面的示例代码运行时，它会返回：

{height=48, alt=Funny face, width=48, src=funny.gif}

解决方案取决于您的预期输出。你要么想要

仅获取<img> 元素之一的属性（例如，第一个）
获取所有<img> 元素及其属性的列表

对于第一个解决方案，只需将 XPath 表达式更改为

//descendant::img[1]/@*

或

//descendant::" + tagName + "[1]/@*

使用tagName 参数。请注意，这不与//img[1]/@* 相同，即使它在这种特殊情况下返回相同的元素。

这样改变时，方法返回：

{height=42, alt=Smiley face, width=42, src=smiley.gif}

这是第一个 <img> 元素的正确返回属性。

请注意，您甚至不必为此类工作使用 XPath 表达式。这是一个非 XPath 版本：

public Map<String, String> getElementAttributesByTagNameNoXPath(String tagName) {
    Document document;
    try (InputStream input = new FileInputStream(sourcePage)) {
        DocumentBuilderFactory docBuilderFactory = DocumentBuilderFactory.newInstance();
        DocumentBuilder docBuilder = docBuilderFactory.newDocumentBuilder();
        document = docBuilder.parse(input);
    } catch (IOException | ParserConfigurationException | SAXException e) {
        throw new RuntimeException(e);
    }

    Node node = document.getElementsByTagName(tagName).item(0);
    NamedNodeMap attributeMap = node.getAttributes();

    Map<String, String> tagInfo = new HashMap<>();
    for (int i = 0; i < attributeMap.getLength(); i++) {
        Node attribute = attributeMap.item(i);
        tagInfo.put(attribute.getNodeName(), attribute.getNodeValue());
    }
    return tagInfo;
}

第二种解决方案需要稍微改变一下。我们要返回文档中所有<img> 元素的属性。多个元素意味着我们将使用一个List，它将包含多个Map<String, String> 实例，其中每个Map 代表一个<img> 元素。

一个完整的 XPath 版本，以防您确实需要一些复杂的 XPath 表达式：

public List<Map<String, String>> getElementsAttributesByTagName(String tagName) {
    Document document;
    try (InputStream input = new FileInputStream(sourcePage)) {
        DocumentBuilderFactory docBuilderFactory = DocumentBuilderFactory.newInstance();
        DocumentBuilder docBuilder = docBuilderFactory.newDocumentBuilder();
        document = docBuilder.parse(input);
    } catch (IOException | ParserConfigurationException | SAXException e) {
        throw new RuntimeException(e);
    }

    NodeList nodeList;
    try {
        XPath xPath = XPathFactory.newInstance().newXPath();
        nodeList = (NodeList)xPath.evaluate("//" + tagName, document, XPathConstants.NODESET);
    } catch (XPathExpressionException e) {
        throw new RuntimeException(e);
    }

    List<Map<String, String>> tagInfoList = new ArrayList<>();
    for (int i = 0; i < nodeList.getLength(); i++) {
        Node node = nodeList.item(i);
        NamedNodeMap attributeMap = node.getAttributes();

        Map<String, String> tagInfo = new HashMap<>();
        for (int j = 0; j < attributeMap.getLength(); j++) {
            Node attribute = attributeMap.item(j);
            tagInfo.put(attribute.getNodeName(), attribute.getNodeValue());
        }
        tagInfoList.add(tagInfo);
    }
    return tagInfoList;
}

要摆脱 XPath 部分，您可以简单地将其切换为单行：

NodeList nodeList = document.getElementsByTagName(tagName);

当使用"img" 参数对上面的测试用例运行时，这两个版本都会返回：_{（为清晰起见格式化）}

[ {height=42, alt=Smiley face, width=42, src=smiley.gif},
  {height=45, alt=Sad face,    width=45, src=sad.gif   },
  {height=48, alt=Funny face,  width=48, src=funny.gif } ]

这是所有<img> 元素的正确列表。

【讨论】：

很多，我的预期输出正好是第二个1

【解决方案2】：

尝试使用

 Map <String, ArrayList<String>> tag = new HashMap <String, ArrayList<String>> ();

【讨论】：

【解决方案3】：

您可以在地图内使用地图：

Map<Map<int, String>, String> // int = "some index" 0,1,etc.. & String1(the value of the second Map) =src & String2(the value of the original Map) =smiley.gif

或

您可以将其反转并在使用时考虑它，例如：

Map<String, String> // String1=key=smiley.gif , String2=value=src

【讨论】：