【问题标题】:JSOUP get div content from div's with same nameJSOUP 从具有相同名称的 div 中获取 div 内容
【发布时间】:2016-03-18 17:38:56
【问题描述】:

我希望解析具有 2 个具有相同类的 div 的网页。

以下是我要解析的网页部分:

<div class="bid-row rgray bmatch" id="m590574">
<div class="mtime">12:00</div>
<div class="mteams w240" data-original-title="" title="">
    <div class="team">Rayo Vallecano</div>
    <div class="team">Malaga CF</div>
</div>
<div class="modds w160">
    <div class="clear">
        <div class="blank"></div>
        <input class="bet" id="q43909084" type="button" value="2.35">
        <input class="bet" id="q43909085" type="button" value="3.30">
        <input class="bet" id="q43909086" type="button" value="3.15">
    </div>
</div>
<div class="minfo">
    <div class="stats" data-brid="7610448_1"></div>
    <div data-tvinfo="Sky Sports 4, Sport1 HU, LiG TV 3, Canal+ Liga, NTV Plus Futbol 2, TK Futbol 1 UA, Digi Sport 2 RO, CANAL9 DK, Sport Klub 1 SRB, SKY Sport Plus IT HD, Eleven HD" class="fleft tv"></div>
    <div class="mlive"></div>
    <div class="slider" data-mode="1" data-tid="36" data-cid="32">+50<span class="glyphicon glyphicon-chevron-right"></span></div>
</div>

我正在使用 JSOUP 来解析它,这是我的代码现在的样子:

     Elements hrefElements = doc.select("div.bmatch");
    DocumentBuilderFactory docFactory = DocumentBuilderFactory.newInstance();
    DocumentBuilder docBuilder = docFactory.newDocumentBuilder();

    // root elements
    org.w3c.dom.Document doc1 = docBuilder.newDocument();
    org.w3c.dom.Element rootElement = doc1.createElement("company");

    doc1.appendChild(rootElement);

     String[] mtime = new String[hrefElements.size()];

     String[] team = new String[hrefElements.size()];
     String[] tvinfo = new String[hrefElements.size()];

     for(int i=0;i<hrefElements.size();i++)
     {
         mtime[i] = hrefElements.get(i).getElementsByClass("mtime").text();
         team[i] = hrefElements.get(i).getElementsByClass("team").text();
         tvinfo[i] = hrefElements.get(i).getElementsByTag("div").attr("data-tvinfo");
     }
     for(int j=0;j<hrefElements.size();j++)
     {
         // staff elements
    org.w3c.dom.Element staff = doc1.createElement("Event");
    rootElement.appendChild(staff);

    // set attribute to staff element
    Attr attr = doc1.createAttribute("id");
    attr.setValue("1");
    staff.setAttributeNode(attr);
          org.w3c.dom.Element firstname = doc1.createElement("Time");
    firstname.appendChild(doc1.createTextNode(mtime[j]));
    staff.appendChild(firstname);

    // lastname elements
    org.w3c.dom.Element lastname = doc1.createElement("Teams");
    lastname.appendChild(doc1.createTextNode(team[j]));
    staff.appendChild(lastname);





    // nickname elements
    org.w3c.dom.Element nickname = doc1.createElement("TV");
    nickname.appendChild(doc1.createTextNode(tvinfo[j]));
    staff.appendChild(nickname);


         System.out.println("Time: "+mtime[j]);
         System.out.println("Event: "+team[j]);
         System.out.println("TvInfo: "+tvinfo[j]);
     }
TransformerFactory transformerFactory = TransformerFactory.newInstance();
    Transformer transformer = transformerFactory.newTransformer();
    DOMSource source = new DOMSource(doc1);
             String nameGame =  jTextField3.getText();
    StreamResult result = new StreamResult(new File("test.xml"));
            //StreamResult result =  new StreamResult(System.out);
 transformer.transform(source, result);
    // Output to console for testing
    // StreamResult result = new StreamResult(System.out);

    transformer.transform(source, result);

    System.out.println("File saved!");

}

但是,我得到的 HTML 部分的输出如下:

 <Event id="1">
        <Time>Today12:00</Time>
        <Teams>Rayo Vallecano Malaga CF</Teams>
        <TV>Sky Sports 4, Sport1 HU, LiG TV 3, Canal+ Liga, NTV Plus Futbol 2, TK Futbol 1 UA, Digi Sport 2 RO, CANAL9 DK, Sport Klub 1 SRB, SKY Sport Plus IT HD, Eleven HD</TV>
    </Event>

我试图实现的最终 xml 应该是这样的:

        <Event id="1">
        <Time>Today12:00</Time>
        <Team1>Rayo Vallecano</Team1>
        <Team2>Malaga CF</Team2>
        <TV>Sky Sports 4, Sport1 HU, LiG TV 3, Canal+ Liga, NTV Plus Futbol 2, TK Futbol 1 UA, Digi Sport 2 RO, CANAL9 DK, Sport Klub 1 SRB, SKY Sport Plus IT HD, Eleven HD</TV>
    </Event>

【问题讨论】:

    标签: java xml jsoup


    【解决方案1】:

    您已使用hrefElements.get(i).getElementsByClass("team").text(); 获取团队名称,它返回所有 macthing 元素的附加文本。在这种情况下,Rayo Vallecano Malaga CF 代表团队 Rayo VallecanoMalaga CF

    试试这个。

            Elements hrefElements = doc.select("div.bmatch");
            DocumentBuilderFactory docFactory = DocumentBuilderFactory.newInstance();
            DocumentBuilder docBuilder = docFactory.newDocumentBuilder();
    
            // root elements
            org.w3c.dom.Document doc1 = docBuilder.newDocument();
            org.w3c.dom.Element rootElement = doc1.createElement("company");
            doc1.appendChild(rootElement);
    
    
            for( int i = 0; i < hrefElements.size(); i++ ) 
            {
                // staff elements
                org.w3c.dom.Element staff = doc1.createElement("Event");
                rootElement.appendChild(staff);
    
                // set attribute to staff element
                Attr attr = doc1.createAttribute("id");
                attr.setValue("" + (i + 1));
                staff.setAttributeNode(attr);
    
                Element timeSection = hrefElements.get(i).select("div.mtime").first(); // one time section
                Element teamsSection = hrefElements.get(i).select("div.mteams").first(); // one team section
                Element infoSection = hrefElements.get(i).select("div.minfo").first(); // one info section
    
                String time = timeSection.text();
                Elements teams = teamsSection.select("div.team"); // many teams within team section
                String tvInfo = infoSection.select("div.tv").first().attr("data-tvinfo");
    
                // time element
                org.w3c.dom.Element timeElement = doc1.createElement("Time");
                timeElement.appendChild(doc1.createTextNode(time));
                staff.appendChild(timeElement);
                System.out.println(timeElement.getTextContent());
    
                // teams
                for(int j = 0; j < teams.size(); j++) {
                    org.w3c.dom.Element teamElement = doc1.createElement("Team" + (j + 1));
                    teamElement.appendChild(doc1.createTextNode(teams.get(j).text()));
                    staff.appendChild(teamElement);
                    System.out.println(teamElement.getTextContent());
                }
    
                // tv info
                org.w3c.dom.Element nickname = doc1.createElement("TV");
                nickname.appendChild(doc1.createTextNode(tvInfo));
                staff.appendChild(nickname);
                System.out.println(nickname.getTextContent());
            }
    
            TransformerFactory transformerFactory = TransformerFactory.newInstance();
            Transformer transformer = transformerFactory.newTransformer();
            DOMSource source = new DOMSource(doc1);
    
            StreamResult result = new StreamResult(new File("test.xml"));
            transformer.transform(source, result);
    
            System.out.println("File saved!");
    

    【讨论】:

    • 先生,这正是我所需要的。
    猜你喜欢
    • 1970-01-01
    • 2023-03-17
    • 1970-01-01
    • 1970-01-01
    • 2018-01-28
    • 1970-01-01
    • 2019-01-15
    • 1970-01-01
    • 2013-09-27
    相关资源
    最近更新 更多