【问题标题】:follow loading progress of huge XML files跟踪大型 XML 文件的加载进度
【发布时间】:2009-12-31 09:35:37
【问题描述】:

我尝试跟踪 dotnet(C#,框架 3.5 SP1)中大型 XML 文件(我不是这些文件的提供者)的加载进度:通过网络文件共享从 1 MB 到 300 MB。

我使用 XmlReader 进行加载,而不是直接使用 XmlDocument.Load 方法来加快加载过程。

顺便说一句,我在互联网/文档上找不到关于如何跟踪此加载进度的任何地方:似乎不存在委托/事件。有没有办法执行这个任务?拥有这种用于 XML 保存目的的功能可能会很不错。

谢谢

【问题讨论】:

  • 您在哪里加载这些文件,在 DOM/数据库或其他地方?您是逐个节点地读取和处理它们还是将它们加载到内存中?
  • 我假设我忘记输入一些信息:我为解析文件(主要使用 Xpath)的 API(我有源代码,但我不想编辑逻辑/解析)加载这些 XML 文件.此 API 在 XML 参数中接受 XML 文件(并使用 XmlReader)或 Stream 的路径。我并不关心快速的解析过程,只关注内存中的加载过程。
  • 另一种最新的处理模型称为VTD-XML,在C#中可用

标签: c# .net xml


【解决方案1】:

假设您正在从流中读取,这是一个(非完美)示例,说明如何做到这一点...... 基本上,ProgressStreamWrapper 包装文件流并在位置发生更改时引发事件。

class Program
{
    static void Main(string[] args)
    {
        Console.WriteLine("Reading big file...");

        FileStream fileStream = File.OpenRead("c:\\temp\\bigfile.xml");
        ProgressStreamWrapper progressStreamWrapper = new ProgressStreamWrapper(fileStream);
        progressStreamWrapper.PositionChanged += (o, ea) => Console.WriteLine((double) progressStreamWrapper.Position / progressStreamWrapper.Length * 100 + "% complete");
        XmlReader xmlReader = XmlReader.Create(progressStreamWrapper);

        while (xmlReader.Read())
        {
            //read the xml document
        }

        Console.WriteLine("DONE");
        Console.ReadLine();
    }
}


public class ProgressStreamWrapper : Stream, IDisposable
{
    public ProgressStreamWrapper(Stream innerStream)
    {
        InnerStream = innerStream;
    }

    public Stream InnerStream { get; private set; }

    public override void Close()
    {
        InnerStream.Close();
    }

    void IDisposable.Dispose()
    {
        base.Dispose();
        InnerStream.Dispose();
    }

    public override void Flush()
    {
        InnerStream.Flush();
    }

    public override IAsyncResult BeginRead(byte[] buffer, int offset, int count, AsyncCallback callback, object state)
    {
        return InnerStream.BeginRead(buffer, offset, count, callback, state);
    }

    public override int EndRead(IAsyncResult asyncResult)
    {
        int endRead = InnerStream.EndRead(asyncResult);
        OnPositionChanged();
        return endRead;
    }

    public override IAsyncResult BeginWrite(byte[] buffer, int offset, int count, AsyncCallback callback, object state)
    {
        return InnerStream.BeginWrite(buffer, offset, count, callback, state);
    }

    public override void EndWrite(IAsyncResult asyncResult)
    {
        InnerStream.EndWrite(asyncResult);
        OnPositionChanged(); ;
    }

    public override long Seek(long offset, SeekOrigin origin)
    {
        long seek = InnerStream.Seek(offset, origin);
        OnPositionChanged();
        return seek;
    }

    public override void SetLength(long value)
    {
        InnerStream.SetLength(value);
    }

    public override int Read(byte[] buffer, int offset, int count)
    {
        int read = InnerStream.Read(buffer, offset, count);
        OnPositionChanged();
        return read;
    }

    public override int ReadByte()
    {
        int readByte = InnerStream.ReadByte();
        OnPositionChanged();
        return readByte;
    }

    public override void Write(byte[] buffer, int offset, int count)
    {
        InnerStream.Write(buffer, offset, count);
        OnPositionChanged();
    }

    public override void WriteByte(byte value)
    {
        InnerStream.WriteByte(value);
        OnPositionChanged();
    }

    public override bool CanRead
    {
        get { return InnerStream.CanRead; }
    }

    public override bool CanSeek
    {
        get { return InnerStream.CanSeek; }
    }

    public override bool CanTimeout
    {
        get { return InnerStream.CanTimeout; }
    }

    public override bool CanWrite
    {
        get { return InnerStream.CanWrite; }
    }

    public override long Length
    {
        get { return InnerStream.Length; }
    }

    public override long Position
    {
        get { return InnerStream.Position; }
        set
        {
            InnerStream.Position = value;
            OnPositionChanged();
        }
    }

    public event EventHandler PositionChanged;

    protected virtual void OnPositionChanged()
    {
        if (PositionChanged != null)
        {
            PositionChanged(this, EventArgs.Empty);
        }
    }

    public override int ReadTimeout
    {
        get { return InnerStream.ReadTimeout; }
        set { InnerStream.ReadTimeout = value; }
    }

    public override int WriteTimeout
    {
        get { return InnerStream.WriteTimeout; }
        set { InnerStream.WriteTimeout = value; }
    }
}

【讨论】:

  • 是的,我可以提供一个 Stream 作为 API 参数,所以我会考虑这种解决方案(PositionChanged 事件)。今天会更新。
  • 我认为我们有相同的想法,但您的代码首先在这里,所以肯定 +1 ;-p
【解决方案2】:

使用内置加载器没有太多;但是,您可以编写一个拦截流 - 从该流中加载您的文档,并通过事件公开Position?即在Read 方法中(每隔一段时间)引发一个事件?


这是一个在读写期间支持更新的示例:

using System;
using System.IO;
using System.Xml;
class ChattyStream : Stream
{
    private Stream baseStream;
    public ChattyStream(Stream baseStream)
    {
        if (baseStream == null) throw new ArgumentNullException("baseStream");
        this.baseStream = baseStream;
        updateInterval = 1000;
    }
    public event EventHandler ProgressChanged;
    protected virtual void OnProgressChanged()
    {
        var handler = ProgressChanged;
        if (handler != null) handler(this, EventArgs.Empty);
    }
    private void CheckDisposed()
    {
        if (baseStream == null) throw new ObjectDisposedException(GetType().Name);
    }
    protected Stream BaseStream
    {
        get { CheckDisposed(); return baseStream; }
    }
    int pos, updateInterval;
    public int UpdateInterval
    {
        get { return updateInterval; }
        set
        {
            if (value <= 0) throw new ArgumentOutOfRangeException("value");
            updateInterval = value;
        }
    }

    protected void Increment(int value)
    {
        if (value > 0)
        {
            pos += value;
            if (pos >= updateInterval)
            {
                OnProgressChanged();
                pos = pos % updateInterval;
            }
        }
    }
    public override int Read(byte[] buffer, int offset, int count)
    {
        int result = BaseStream.Read(buffer, offset, count);
        Increment(result);
        return result;
    }
    public override void Write(byte[] buffer, int offset, int count)
    {
        BaseStream.Write(buffer, offset, count);
        Increment(count);
    }
    public override void SetLength(long value)
    {
        BaseStream.SetLength(value);
    }
    public override void Flush()
    {
        BaseStream.Flush();
    }
    public override long Position
    {
        get { return BaseStream.Position; }
        set { BaseStream.Position = value; }
    }
    public override long Seek(long offset, SeekOrigin origin)
    {
        return BaseStream.Seek(offset, origin);
    }
    public override long Length
    {
        get { return BaseStream.Length; }
    }
    public override bool CanWrite
    {
        get { return BaseStream.CanWrite; }
    }
    public override bool CanRead
    {
        get { return BaseStream.CanRead; }
    }
    public override bool CanSeek
    {
        get { return BaseStream.CanSeek; }
    }
    protected override void Dispose(bool disposing)
    {
        if (disposing && baseStream != null)
        {
            baseStream.Dispose();
        }
        baseStream = null;
        base.Dispose(disposing);
    }
    public override void Close()
    {
        if (baseStream != null) baseStream.Close();
        base.Close();
    }
    public override int ReadByte()
    {
        int val = BaseStream.ReadByte();
        if (val >= 0) Increment(1);
        return val;
    }
    public override void WriteByte(byte value)
    {
        BaseStream.WriteByte(value);
        Increment(1);
    }

}
static class Program
{
    static void Main()
    {
        /* invent some big data */
        const string path = "bigfile";
        if (File.Exists(path)) File.Delete(path);
        using (var chatty = new ChattyStream(File.Create(path)))
        {
            chatty.ProgressChanged += delegate
            {
                Console.WriteLine("Writing: " + chatty.Position);
            };
            using (var writer = XmlWriter.Create(chatty))
            {
                writer.WriteStartDocument();
                writer.WriteStartElement("xml");
                for (int i = 0; i < 50000; i++)
                {
                    writer.WriteElementString("add", i.ToString());
                }
                writer.WriteEndElement();
                writer.WriteEndDocument();
            }
            chatty.Close();
        }


        /* read it */

        using (var chatty = new ChattyStream(File.OpenRead("bigfile")))
        {
            chatty.ProgressChanged += delegate
            {
                Console.WriteLine("Reading: " + chatty.Position);
            };

            // now read "chatty" with **any** API; XmlReader, XmlDocument, XDocument, etc
            XmlDocument doc = new XmlDocument();
            doc.Load(chatty);
        }
    }
}

【讨论】:

    【解决方案3】:

    使用 DataSet.Read() 怎么样?

    或者,

    // Create the document.
            XmlDocument doc = new XmlDocument();
            doc.Load(file);
    
            // Loop through all the nodes, and create the list of Product objects .
            List<Product> products = new List<Product>();
    
            foreach (XmlElement element in doc.DocumentElement.ChildNodes)
            {
                Product newProduct = new Product();
                newProduct.ID = Int32.Parse(element.GetAttribute("ID"));
                newProduct.Name = element.GetAttribute("Name");
    
                // If there were more than one child node, you would probably use
                // another For Each loop here and move through the
                // Element.ChildNodes collection.
                newProduct.Price = Decimal.Parse(element.ChildNodes[0].InnerText);
    
                products.Add(newProduct);
            }
    

    【讨论】:

    • 基本上,我尝试关注加载机制而不是解析机制:解析过程由外部 API 完成。在您的示例中,'doc.Load(file);'将在这一步加载整个 XML 文件,并且仅当文件将加载到内存中时才会更进一步。
    猜你喜欢
    • 2010-10-02
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多