【问题标题】:SAX parsing: Encountered mixed content within text elementSAX 解析:在文本元素中遇到混合内容
【发布时间】:2013-01-18 17:19:37
【问题描述】:

我正在尝试解析如下所示的 XML 文件(代表电视指南)...

<?xml version="1.0" encoding="utf-8"?>
<channels>
  <channel>
    <name>BBC ONE</name>
    <oid>10029</oid>
      ...
    <programmes>
      <programme>
        <description>Blah blah blah</description>
        <end_time>2013-02-04 01:40:00</end_time>
        <episode>9</episode>
        <genres>Entertainment</genres>
        <oid>10583734</oid>
        <season>8</season>
        <start_time>2013-02-04 00:15:00</start_time>
        <title>The Celebrity Apprentice USA</title>
      </programme>
      <programme>
        ..
      </programme>
    </programmes>
  </channel>
  <channel>
    ...
  </channel>
</channels>

我正在使用两个解析器 - 一个用于通道,另一个用于程序,但显然这意味着我需要检索整个 &lt;programmes&gt;...&lt;/programmes&gt; 以将其传递给“程序”解析器。

我在“频道”解析器中尝试了以下内容...

public List<XMLTVChannel> parse() {
    RootElement rootElement = new RootElement("channels");
    final List<XMLTVChannel> channelsList = new ArrayList<XMLTVChannel>();
    Element channelElement = rootElement.getChild("channel");

    ...

    // Set the EndTextElementListeners for the <channel> child elements
    channelElement.getChild(CHANNEL_OID).setEndTextElementListener(new EndTextElementListener() {
        public void end(String body) {
            currentChannel.setOid(body);
        }
    });

    ...

    // HERE'S THE PROBLEM
    channelElement.getChild("programmes").setEndTextElementListener(new EndTextElementListener() {
        public void end(String body) {
            // NEED TO INVOKE XMLTVProgrammeParser HERE
        }
    });
    try {
        Xml.parse(getInputStream(), Xml.Encoding.UTF_8, rootElement.getContentHandler());
    } catch (Exception e) {
        throw new RuntimeException(e);
    }
    return channelsList;
}

好的,所以我已经用 Google 搜索了,我确切地知道问题出在哪里 - 传递给 end(...) 方法的 String body 参数应该只包含文本,而它是元素及其文本的混合体。

我已经阅读了一些类似的 stackoverflow 问题和文章,这些问题和文章建议我需要定义自己的 ContentHandler,但我还没有找到与我正在尝试做的事情完全一样的事情。自定义 ContentHandler 是我唯一的选择还是有其他方法?

【问题讨论】:

    标签: java xml-parsing sax


    【解决方案1】:

    你的意思是你想要这个输出:

     BBC ONE
    10029
    ------------------------
    The Celebrity Apprentice USA
    2013-02-04 00:15:00 - 2013-02-04 01:40:00
    Entertainment
    Season : 8 / Episode : 9
    Description:
    Blah blah blah
    10583734
    **********************
    The Celebrity Apprentice USA
    2013-02-04 01:45:00 - 2013-02-04 02:25:00
    Entertainment
    Season : 8 / Episode : 10
    Description:
    Blah blah blah
    10583735
    **********************
    //////////////////////////
    BBC TWO
    10030
    ------------------------
    American Dad
    2013-02-04 00:30:00 - 2013-02-04 01:25:00
    Cartoon
    Season : 14 / Episode : 1
    Description:
    Blah blah blah
    10583734
    **********************
    American Dad
    2013-02-04 01:30:00 - 2013-02-04 02:15:00
    Cartoon
    Season : 14 / Episode : 2
    Description:
    Blah blah blah
    10583735
    **********************
    //////////////////////////
    

    我已经修改了你的 xml 文件:

        <?xml version="1.0" encoding="utf-8"?>
    <channels>
      <channel>
        <name>BBC ONE</name>
        <oid>10029</oid>
        <programmes>
          <programme>
            <description>Blah blah blah</description>
            <end_time>2013-02-04 01:40:00</end_time>
            <episode>9</episode>
            <genres>Entertainment</genres>
            <oid>10583734</oid>
            <season>8</season>
            <start_time>2013-02-04 00:15:00</start_time>
            <title>The Celebrity Apprentice USA</title>
          </programme>
           <programme>
            <description>Blah blah blah</description>
            <end_time>2013-02-04 02:25:00</end_time>
            <episode>10</episode>
            <genres>Entertainment</genres>
            <oid>10583735</oid>
            <season>8</season>
            <start_time>2013-02-04 01:45:00</start_time>
            <title>The Celebrity Apprentice USA</title>
          </programme>
        </programmes>
      </channel>
      <channel>
          <name>BBC TWO</name>
          <oid>10030</oid>
          <programmes>
          <programme>
            <description>Blah blah blah</description>
            <end_time>2013-02-04 01:25:00</end_time>
            <episode>1</episode>
            <genres>Cartoon</genres>
            <oid>10583734</oid>
            <season>14</season>
            <start_time>2013-02-04 00:30:00</start_time>
            <title>American Dad</title>
          </programme>
           <programme>
            <description>Blah blah blah</description>
            <end_time>2013-02-04 02:15:00</end_time>
            <episode>2</episode>
            <genres>Cartoon</genres>
            <oid>10583735</oid>
            <season>14</season>
            <start_time>2013-02-04 01:30:00</start_time>
            <title>American Dad</title>
          </programme>
        </programmes>
      </channel>
    </channels>
    

    Java 类:

    频道

    public class Channel {
    
            private String name;
            private String oid;
            private ArrayList<Programme> alProgrammes;
    
            public Channel(){}
    
            public String getName() {
                return name;
            }
    
            public void setName(String name) {
                this.name = name;
            }
    
            public String getOid() {
                return oid;
            }
    
            public void setOid(String oid) {
                this.oid = oid;
            }
    
            public ArrayList<Programme> getAlProgrammes() {
                return alProgrammes;
            }
    
            public void setAlProgrammes(ArrayList<Programme> alProgrammes) {
                this.alProgrammes = alProgrammes;
            }
    
    
        }
    

    计划

     public class Programme {
    
        private String description;
        private String end_time;
        private String episode;
        private String genres;
        private String oid;
        private String season;
        private String start_time;
        private String title;
    
    
    
        public Programme() {
        }
    
        //Getters / Setters
        public String getDescription() {
            return description;
        }
        public void setDescription(String description) {
            this.description = description;
        }
        public String getEnd_time() {
            return end_time;
        }
        public void setEnd_time(String end_time) {
            this.end_time = end_time;
        }
        public String getEpisode() {
            return episode;
        }
        public void setEpisode(String episode) {
            this.episode = episode;
        }
        public String getGenres() {
            return genres;
        }
        public void setGenres(String genres) {
            this.genres = genres;
        }
        public String getOid() {
            return oid;
        }
        public void setOid(String oid) {
            this.oid = oid;
        }
        public String getSeason() {
            return season;
        }
        public void setSeason(String season) {
            this.season = season;
        }
        public String getStart_time() {
            return start_time;
        }
        public void setStart_time(String start_time) {
            this.start_time = start_time;
        }
        public String getTitle() {
            return title;
        }
        public void setTitle(String title) {
            this.title = title;
        }
    
    }
    

    XML 管理器

    public final class XMLManager {
    
        public static ArrayList<Channel> getAlChannels(){
    
              DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
              DocumentBuilder db = null;
              Document doc = null;
              ArrayList<Channel> alChannels = new ArrayList<>();
    
              try {
    
                db = dbf.newDocumentBuilder();
                doc = db.parse(new File("D:\\Loic_Workspace\\Test2\\res\\test.xml"));
                NodeList ndListChannels = doc.getElementsByTagName("channel");
    
                Integer channelsCount = ndListChannels.getLength();
                NodeList ndListChannel = null;
                Integer ndListChannelLength = null;
                Channel channel = null;
                NodeList ndListProgrammes = null;
                for(int i=0;i<channelsCount;i++){
    
                    ndListChannel = ndListChannels.item(i).getChildNodes();
                    ndListChannelLength = ndListChannel.getLength();
                    channel = new Channel();
                    for(int j=0;j<ndListChannelLength;j++){
    
                        Node currentNode = ndListChannel.item(j);
                        String currentNodeName = currentNode.getNodeName();
                        String value = currentNode.getTextContent();
    
                        if(currentNodeName.equals("name")){
                            channel.setName(value);
                        }
                        if(currentNodeName.equals("oid")){
                            channel.setOid(value);
                        }
                        if(currentNodeName.equals("programmes")){
                            ndListProgrammes = currentNode.getChildNodes();
                            ArrayList<Programme> alProgrammes = new ArrayList<>();
                            for(int k=0;k<ndListProgrammes.getLength();k++){
    
                                Node ndProgrammes = ndListProgrammes.item(k);
                                if(ndProgrammes.hasChildNodes()){
    
                                    NodeList ndListProgramme = ndProgrammes.getChildNodes();
                                    Integer ndListProgrammeLength = ndListProgramme.getLength();
                                    Programme programme = new Programme();
                                    for(int l=0;l<ndListProgrammeLength;l++){
    
                                        Node  ndProgramme = ndListProgramme.item(l);
                                        String nodeProgrameName = ndProgramme.getNodeName();
                                        String nodeProgrameValue = ndProgramme.getTextContent();
                                        if(nodeProgrameName.equals("description")){
                                            programme.setDescription(nodeProgrameValue);
                                        }
                                        if(nodeProgrameName.equals("end_time")){
    
                                            programme.setEnd_time(nodeProgrameValue);
                                        }
                                        if(nodeProgrameName.equals("episode")){
                                            programme.setEpisode(nodeProgrameValue);
                                        }
                                        if(nodeProgrameName.equals("genres")){
                                            programme.setGenres(nodeProgrameValue);
                                        }
                                        if(nodeProgrameName.equals("oid")){
                                            programme.setOid(nodeProgrameValue);
                                        }
                                        if(nodeProgrameName.equals("season")){
                                            programme.setSeason(nodeProgrameValue);
                                        }
                                        if(nodeProgrameName.equals("start_time")){
                                            programme.setStart_time(nodeProgrameValue);
                                        }
                                        if(nodeProgrameName.equals("title")){
                                            programme.setTitle(nodeProgrameValue);
                                        }
    
                                    }
    
                                    alProgrammes.add(programme);
    
                                }
    
                            }
    
                            channel.setAlProgrammes(alProgrammes);
    
                        }
    
                    }
    
                    alChannels.add(channel);
    
                }
    
    
    
              } catch (ParserConfigurationException e) {
                // TODO Auto-generated catch block
                e.printStackTrace();
            } catch (SAXException e) {
                // TODO Auto-generated catch block
                e.printStackTrace();
            } catch (IOException e) {
                // TODO Auto-generated catch block
                e.printStackTrace();
            }
    
              return alChannels;
    
        }
    
    
    
    }
    

    主要

    public class MyMain {
    
        /**
         * @param args
         */
        public static void main(String[] args) {
    
    
            ArrayList<Channel> alChannels = XMLManager.getAlChannels();
            for(Channel c:alChannels){
                System.out.println(c.getName());
                System.out.println(c.getOid());
                System.out.println("------------------------");
                for(Programme p:c.getAlProgrammes()){
                    System.out.println(p.getTitle());
                    System.out.println(p.getStart_time()+" - "+p.getEnd_time());
                    System.out.println(p.getGenres());
                    System.out.println("Season : "+p.getSeason()+" / Episode : "+p.getEpisode());
                    System.out.println("Description:\n"+p.getDescription());
                    System.out.println(p.getOid());
                    System.out.println("**********************");
                }
    
                System.out.println("//////////////////////////");
    
            }
    
        }
    
    }
    

    更新

    这是我如何使用 SAX 的示例。

    重要提示:我保留了我的课程计划和频道

    ChannelsHandler

    public class ChannelsHandler extends DefaultHandler{
    
        private ArrayList<Channel> tvGuide;
        private Channel channel;
        private ArrayList<Programme> alProgrammes;
        private Programme programme;
        private String reading;
    
        public ChannelsHandler(){
            super();
        }
    
        @Override
        public void startElement(String uri, String localName, String qName,
                Attributes attributes) throws SAXException {
    
            if(qName.equals("channels")){
                tvGuide = new ArrayList<>();
            }else if(qName.equals("channel")){
                channel = new Channel();
            }
            else if(qName.equals("channel")){
                channel = new Channel();
            }
            else if(qName.equals("programmes")){
                alProgrammes = new ArrayList<>();
            }
            else if(qName.equals("programme")){
                programme = new Programme();
            }
    
        }
    
        @Override
        public void characters(char[] ch, int start, int length)
                throws SAXException {
            reading = new String(ch, start, length);
        }
    
        @Override
        public void endElement(String uri, String localName, String qName)
                throws SAXException {
    
            if(qName.equals("channel")){
                tvGuide.add(channel);
                channel = null;
            }
            if(qName.equals("name")){
                channel.setName(reading);
            }
            else if(qName.equals("programmes")){
                channel.setAlProgrammes(alProgrammes);
                alProgrammes = new ArrayList<>();
            }
            else if(qName.equals("programme")){
                alProgrammes.add(programme);
                programme = null;
            }
            else if(qName.equals("description")){
                programme.setDescription(reading);
            }
            else if(qName.equals("end_time")){
                programme.setEnd_time(reading);
            }
            else if(qName.equals("episode")){
                programme.setEpisode(reading);
            }
            else if(qName.equals("genres")){
                programme.setGenres(reading);
            }
            else if(qName.equals("season")){
                programme.setSeason(reading);
            }
            else if(qName.equals("start_time")){
                programme.setStart_time(reading);
            }
            else if(qName.equals("title")){
                programme.setTitle(reading);
            }
    
        }
    
        public ArrayList<Channel> getTVGuide(){
            return tvGuide;
        }
    
    
    
    }
    

    我的新主

    public static void main(String[] args) {
    
            SAXParserFactory factory = SAXParserFactory.newInstance();
            try {
                SAXParser parser = factory.newSAXParser();
                File file = new File("D:\\Loic_Workspace\\TestSAX\\res\\test.xml");
                ChannelsHandler handler = new ChannelsHandler();
                parser.parse(file,handler);
                List<Channel> tvGuide = handler.getTVGuide();
                for(Channel c:tvGuide){
                    System.out.println(c.getName());
                    System.out.println("------------------------");
                    for(Programme p:c.getAlProgrammes()){
                        System.out.println(p.getTitle());
                        System.out.println(p.getStart_time()+" - "+p.getEnd_time());
                        System.out.println(p.getGenres());
                        System.out.println("Season : "+p.getSeason()+" / Episode : "+p.getEpisode());
                        System.out.println("Description:\n"+p.getDescription());
                        System.out.println("**********************");
                    }
    
                    System.out.println("//////////////////////////");
    
                }
            } catch (ParserConfigurationException e) {
                // TODO Auto-generated catch block
                e.printStackTrace();
            } catch (SAXException e) {
                // TODO Auto-generated catch block
                e.printStackTrace();
            } catch (IOException e) {
                // TODO Auto-generated catch block
                e.printStackTrace();
            }
    
        }
    

    在我的控制台中输出:

    BBC ONE
    ------------------------
    The Celebrity Apprentice USA
    2013-02-04 00:15:00 - 2013-02-04 01:40:00
    Entertainment
    Season : 8 / Episode : 9
    Description:
    Blah blah blah
    **********************
    The Celebrity Apprentice USA
    2013-02-04 01:45:00 - 2013-02-04 02:25:00
    Entertainment
    Season : 8 / Episode : 10
    Description:
    Blah blah blah
    **********************
    //////////////////////////
    BBC TWO
    ------------------------
    American Dad
    2013-02-04 00:30:00 - 2013-02-04 01:25:00
    Cartoon
    Season : 14 / Episode : 1
    Description:
    Blah blah blah
    **********************
    American Dad
    2013-02-04 01:30:00 - 2013-02-04 02:15:00
    Cartoon
    Season : 14 / Episode : 2
    Description:
    Blah blah blah
    **********************
    //////////////////////////
    

    这是我第一次使用 SAX。也许您可以找到更有效的方法,但它正在工作:-) 我没有在我的更新中管理节目或频道的重复 OID 标签。

    【讨论】:

    • 谢谢,但您的解决方案是针对 DOM 的,我需要一个 SAX 解决方案。我实际上正在开发一个 Android 应用程序,但由于我的问题不是特定于 Android 的,所以我没有在问题上为它添加标签。移动应用程序最大的问题是内存有限,而且我的一些用户拥有大量电视频道,因此出于内存原因必须使用 SAX 解析 XML 数据。
    • 无论解决方案有多好(我相信没问题),+1 表示同情和彻底的回答。
    • @loikkk :尽管我已经使用 SAX 找到了自己的解决方案,但我会接受您的回答。感谢您的回复。
    猜你喜欢
    • 1970-01-01
    • 2016-10-15
    • 1970-01-01
    • 2021-09-19
    • 1970-01-01
    • 2011-03-13
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多