【问题标题】:Spliting big XML file into small based on tag name根据标签名称将大 XML 文件拆分为小文件
【发布时间】:2013-08-31 16:17:51
【问题描述】:

我有一个要求,我将获取一个 xml 文件和一个标签名称作为输入,我必须使用 java.xml 使用给定的标签名称拆分 xml 文件。请。建议我s

输入: XML文件

  <note>
    <to>Tove</to>
    <from>Jani</from>
    <heading>Reminder</heading>
    <body>Don't forget me this weekend!</body>
   </note>

  <book>
  <author>Gambardella, Matthew</author>
  <title>XML Developer's Guide</title>
  <genre>Computer</genre>
  <price>44.95</price>
  <publish_date>2000-10-01</publish_date>
  <description>An in-depth look at creating applications 
  with XML.</description>
  </book>
 <book>
  <author>Ralls, Kim</author>
  <title>Midnight Rain</title>
  <genre>Fantasy</genre>
  <price>5.95</price>
  <publish_date>2000-12-16</publish_date>
  <description>A former architect battles corporate zombies, 
  an evil sorceress, and her own childhood to become queen 
  of the world.</description>

标签名称:书

输出:

<book>
  <author>Gambardella, Matthew</author>
  <title>XML Developer's Guide</title>
  <genre>Computer</genre>
  <price>44.95</price>
  <publish_date>2000-10-01</publish_date>
  <description>An in-depth look at creating applications 
  with XML.</description>
  </book>
 <book>
  <author>Ralls, Kim</author>
  <title>Midnight Rain</title>
  <genre>Fantasy</genre>
  <price>5.95</price>`enter code here`
  <publish_date>2000-12-16</publish_date>
  <description>A former architect battles corporate zombies, 
  an evil sorceress, and her own childhood to become queen 
  of the world.</description>
 </book>

【问题讨论】:

  • 大学课程?

标签: java xml-parsing sax


【解决方案1】:

我认为一般算法如下:

  • 将文件读入缓冲区
  • 找到您的第一个实例 标记
  • 继续阅读行,直到找到最后一个标签
  • 输出 那些行

【讨论】:

    【解决方案2】:

    JSOUP 可以轻松做到这一点

    Jsoup

    这里是完整的工作example

    import java.io.File;
    import java.io.IOException;
    
    import org.apache.commons.io.FileUtils;
    import org.jsoup.Jsoup;
    import org.jsoup.nodes.Document;
    import org.jsoup.select.Elements;
    
    public class Test {
    
        public static void main(String args[]) throws IOException {
            String path = Test.class.getResource("/test.txt").getPath();
            String string = FileUtils.readFileToString(new File(path));
    
            Document doc = Jsoup.parse(string);
            Elements elementsByTag = doc.getElementsByTag("book");
            System.out.println(elementsByTag);
        }
    
    }
    

    test.txt

     <note>
        <to>Tove</to>
        <from>Jani</from>
        <heading>Reminder</heading>
        <body>Don't forget me this weekend!</body>
       </note>
    
      <book>
      <author>Gambardella, Matthew</author>
      <title>XML Developer's Guide</title>
      <genre>Computer</genre>
      <price>44.95</price>
      <publish_date>2000-10-01</publish_date>
      <description>An in-depth look at creating applications 
      with XML.</description>
      </book>
     <book>
      <author>Ralls, Kim</author>
      <title>Midnight Rain</title>
      <genre>Fantasy</genre>
      <price>5.95</price>
      <publish_date>2000-12-16</publish_date>
      <description>A former architect battles corporate zombies, 
      an evil sorceress, and her own childhood to become queen 
      of the world.</description>
      </book>
    

    输出

    <book> 
     <author>
      Gambardella, Matthew
     </author> 
     <title>XML Developer's Guide</title> 
     <genre>
      Computer
     </genre> 
     <price>
      44.95
     </price> 
     <publish_date>
      2000-10-01
     </publish_date> 
     <description>
      An in-depth look at creating applications with XML.
     </description> 
    </book>
    <book> 
     <author>
      Ralls, Kim
     </author> 
     <title>Midnight Rain</title> 
     <genre>
      Fantasy
     </genre> 
     <price>
      5.95
     </price> 
     <publish_date>
      2000-12-16
     </publish_date> 
     <description>
      A former architect battles corporate zombies, an evil sorceress, and her own childhood to become queen of the world.
     </description> 
    </book>
    

    【讨论】:

      猜你喜欢
      • 2015-09-26
      • 1970-01-01
      • 1970-01-01
      • 2018-03-01
      • 1970-01-01
      • 2019-09-23
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多