【问题标题】:Parsing third-party XML解析第三方 XML
【发布时间】:2010-04-26 12:45:32
【问题描述】:

你会采用什么路径来解析一个没有架构的大型 XML 文件(2MB - 20 MB 或更大)(我无法用 XSD.exe 推断一个,因为文件结构很奇怪,请检查 sn-p下面)?

选项

1) XML 反序列化(但如前所述,我没有架构,XSD 工具抱怨文件内容), 2)Linq到XML, 3) 加载到 XmlDocument 中, 4) 使用 XmlReader & stuff 手动解析。

这是 XML 文件 sn-p:

<?xml version="1.0" encoding="utf-8"?>
<xmlData date="29.04.2010 12:09:13">
 <Table>
  <ident>079186</ident>
  <stock>0</stock>
  <pricewotax>33.94000000</pricewotax>
  <discountpercent>0.00000000</discountpercent>
 </Table>
 <Table>
  <ident>079190</ident>
  <stock>1</stock>
  <pricewotax>10.50000000</pricewotax>
  <discountpercent>0.00000000</discountpercent>
  <pricebyquantity>
   <Table>
    <quantity>5</quantity>
    <pricewotax>10.00000000</pricewotax>
    <discountpercent>0.00000000</discountpercent>
   </Table>
   <Table>
    <quantity>8</quantity>
    <pricewotax>9.00000000</pricewotax>
    <discountpercent>0.00000000</discountpercent>
   </Table>
  </pricebyquantity>
 </Table>
</xmlData>

【问题讨论】:

标签: .net xml xml-serialization


【解决方案1】:

这是 XSD:

<?xml version="1.0" encoding="utf-8"?>
<xs:schema attributeFormDefault="unqualified" elementFormDefault="qualified" xmlns:xs="http://www.w3.org/2001/XMLSchema">
  <xs:element name="xmlData">
    <xs:complexType>
      <xs:sequence>
        <xs:element maxOccurs="unbounded" name="Table">
          <xs:complexType>
            <xs:sequence>
              <xs:element name="ident" type="xs:int" />
              <xs:element name="stock" type="xs:int" />
              <xs:element name="pricewotax" type="xs:double" />
              <xs:element name="discountpercent" type="xs:double" />
              <xs:element minOccurs="0" name="pricebyquantity">
                <xs:complexType>
                  <xs:sequence>
                    <xs:element maxOccurs="unbounded" name="Table">
                      <xs:complexType>
                        <xs:sequence>
                          <xs:element name="quantity" type="xs:int" />
                          <xs:element name="pricewotax" type="xs:double" />
                          <xs:element name="discountpercent" type="xs:double" />
                        </xs:sequence>
                      </xs:complexType>
                    </xs:element>
                  </xs:sequence>
                </xs:complexType>
              </xs:element>
            </xs:sequence>
          </xs:complexType>
        </xs:element>
      </xs:sequence>
      <xs:attribute name="date" type="xs:string" use="required" />
    </xs:complexType>
  </xs:element>
</xs:schema>

这是可序列化的类:

//------------------------------------------------------------------------------
// <auto-generated>
//     This code was generated by a tool.
//     Runtime Version:2.0.50727.3603
//
//     Changes to this file may cause incorrect behavior and will be lost if
//     the code is regenerated.
// </auto-generated>
//------------------------------------------------------------------------------

// 
// This source code was auto-generated by xsd, Version=2.0.50727.1432.
// 
namespace StockInfo {
    using System.Xml.Serialization;


    /// <remarks/>
    [System.CodeDom.Compiler.GeneratedCodeAttribute("xsd", "2.0.50727.1432")]
    [System.SerializableAttribute()]
    [System.Diagnostics.DebuggerStepThroughAttribute()]
    [System.ComponentModel.DesignerCategoryAttribute("code")]
    [System.Xml.Serialization.XmlTypeAttribute(AnonymousType=true)]
    [System.Xml.Serialization.XmlRootAttribute(Namespace="", IsNullable=false)]
    public partial class xmlData {

        private xmlDataTable[] tableField;

        private string dateField;

        /// <remarks/>
        [System.Xml.Serialization.XmlElementAttribute("Table")]
        public xmlDataTable[] Table {
            get {
                return this.tableField;
            }
            set {
                this.tableField = value;
            }
        }

        /// <remarks/>
        [System.Xml.Serialization.XmlAttributeAttribute()]
        public string date {
            get {
                return this.dateField;
            }
            set {
                this.dateField = value;
            }
        }
    }

    /// <remarks/>
    [System.CodeDom.Compiler.GeneratedCodeAttribute("xsd", "2.0.50727.1432")]
    [System.SerializableAttribute()]
    [System.Diagnostics.DebuggerStepThroughAttribute()]
    [System.ComponentModel.DesignerCategoryAttribute("code")]
    [System.Xml.Serialization.XmlTypeAttribute(AnonymousType=true)]
    public partial class xmlDataTable {

        private int identField;

        private int stockField;

        private double pricewotaxField;

        private double discountpercentField;

        private xmlDataTableTable[] pricebyquantityField;

        /// <remarks/>
        public int ident {
            get {
                return this.identField;
            }
            set {
                this.identField = value;
            }
        }

        /// <remarks/>
        public int stock {
            get {
                return this.stockField;
            }
            set {
                this.stockField = value;
            }
        }

        /// <remarks/>
        public double pricewotax {
            get {
                return this.pricewotaxField;
            }
            set {
                this.pricewotaxField = value;
            }
        }

        /// <remarks/>
        public double discountpercent {
            get {
                return this.discountpercentField;
            }
            set {
                this.discountpercentField = value;
            }
        }

        /// <remarks/>
        [System.Xml.Serialization.XmlArrayItemAttribute("Table", IsNullable=false)]
        public xmlDataTableTable[] pricebyquantity {
            get {
                return this.pricebyquantityField;
            }
            set {
                this.pricebyquantityField = value;
            }
        }
    }

    /// <remarks/>
    [System.CodeDom.Compiler.GeneratedCodeAttribute("xsd", "2.0.50727.1432")]
    [System.SerializableAttribute()]
    [System.Diagnostics.DebuggerStepThroughAttribute()]
    [System.ComponentModel.DesignerCategoryAttribute("code")]
    [System.Xml.Serialization.XmlTypeAttribute(AnonymousType=true)]
    public partial class xmlDataTableTable {

        private int quantityField;

        private double pricewotaxField;

        private double discountpercentField;

        /// <remarks/>
        public int quantity {
            get {
                return this.quantityField;
            }
            set {
                this.quantityField = value;
            }
        }

        /// <remarks/>
        public double pricewotax {
            get {
                return this.pricewotaxField;
            }
            set {
                this.pricewotaxField = value;
            }
        }

        /// <remarks/>
        public double discountpercent {
            get {
                return this.discountpercentField;
            }
            set {
                this.discountpercentField = value;
            }
        }
    }
}

一个警告:反序列化可能不是解析 20MB 文件的最高效方式。 XmlReader 可能是最快的方法,但这意味着手动操作。

【讨论】:

  • 顺便说一句,我使用 XmlSchemaInference 类生成了 xsd。
  • 谢谢,虽然我决定用 Linq to Xml 来解析这个,所以我不依赖序列化。
【解决方案2】:

我会将它加载到XmlDocument 中,然后使用 XPath 进行相应的处理。 LINQ 可能是这里最好的选择,但我对它不是很熟悉,所以我不能说。

【讨论】:

  • 我在某处读到加载到 XmlDocument 可能会导致高内存消耗,但我不确定。
  • 是的,它必须将整个文件加载到内存中。但在这种情况下,2-20MB 不应该是主要问题。
猜你喜欢
  • 1970-01-01
  • 2021-12-18
  • 2012-11-06
  • 1970-01-01
  • 2023-04-08
  • 2011-09-07
  • 2017-08-06
  • 1970-01-01
  • 2014-08-12
相关资源
最近更新 更多