【问题标题】:Read large text file until a certain string读取大文本文件直到某个字符串
【发布时间】:2014-05-13 07:19:49
【问题描述】:

我有一个大字符串分隔的文本文件(不是单字符分隔的),如下所示:

第一个数据[STRING-SEPERATOR]第二个数据[STRING-SEPERATOR] ...

我不想将整个文件加载到内存中,因为它的大小(~250MB)。如果我用System.IO.File.ReadAllText 阅读整个文件,我会得到一个OutOfMemoryException

因此,我想读取文件直到[STRING-SEPERATOR] 第一次出现,然后继续读取下一个字符串。这就像从文件中“取出”first data,处理它,然后继续使用 second data,它现在是文件的第一个数据。

System.IO.StreamReader.ReadLine() 对我没有帮助,因为文件的内容是一行。

您知道如何在 .NET 中读取文件直到某个字符串?

希望大家多多指教,谢谢。

【问题讨论】:

  • [STRING-SEPERATOR]是单个字符还是一串字符?
  • 是一串字符。
  • [STRING-SEPERATOR] 有多长,每个连续的[STRING-SEPERATOR] 之间可以多长时间?
  • [STRING-SEPERATOR] 是一个 GUID。我可以是分隔符之间的大约 100 个字符。
  • 总是一样的Guid

标签: .net string file large-files


【解决方案1】:

这应该对你有帮助。

private IEnumerable<string> ReadCharsByChunks(int chunkSize, string filePath)
{
    using (FileStream fs = new FileStream(filePath, FileMode.Open))
    {
        byte[] buffer = new byte[chunkSize]; 
        int currentRead;
        while ((currentRead = fs.Read(buffer, 0, chunkSize)) > 0)
        {
            yield return Encoding.Default.GetString(buffer, 0, currentRead);
        }
    }
}

private void SearchWord(string searchWord)
{
    StringBuilder builder = new StringBuilder();
    foreach (var chars in ReadCharsByChunks(2, "sample.txt"))//Can be any number
    {
        builder.Append(chars);

        var existing = builder.ToString();
        int foundIndex = -1;
        if ((foundIndex = existing.IndexOf(searchWord)) >= 0)
        {
            //Found
            MessageBox.Show("Found");

            builder.Remove(0, foundIndex + searchWord.Length);
        }
        else if (!existing.Contains(searchWord.First()))
        {
            builder.Clear();
        }
    }
}

【讨论】:

    【解决方案2】:

    StreamReader.Read 有一些可能对您有所帮助的重载。 试试这个:

    int index, count;
    index = 0;
    count = 200; // or whatever number you think is better
    char[] buffer = new char[count];
    System.IO.StreamReader sr = new System.IO.StreamReader("Path here");
    while (sr.Read(buffer, index, count) > 0) { 
        /*
        check if buffer contains your string seperator, or at least some part of it 
        if it contains a part of it, you need check the rest of the stream to make sure it's a real seporator
        do your stuff, set the index to one character after the last seporator.
        */
    }
    

    【讨论】:

      【解决方案3】:

      文本文件也可以按字符读取,如this questions 中所述。要搜索某个字符串,您必须使用一些手动实现的逻辑,该逻辑可以根据字符输入搜索所需的字符串,这可以通过状态机完成。

      【讨论】:

        【解决方案4】:

        感谢您的回复。这是我在 VB.NET 中编写的函数:

        Public Function ReadUntil(Stream As System.IO.FileStream, UntilText As String) As String
                    Dim builder As New System.Text.StringBuilder()
                    Dim returnTextBuilder As New System.Text.StringBuilder()
                    Dim returnText As String = String.Empty
                    Dim size As Integer = CInt(UntilText.Length / 2) - 1
                    Dim buffer(size) As Byte
                    Dim currentRead As Integer = -1
        
                    Do Until currentRead = 0
                        Dim collected As String = Nothing
                        Dim chars As String = Nothing
                        Dim foundIndex As Integer = -1
        
                        currentRead = Stream.Read(buffer, 0, buffer.Length)
                        chars = System.Text.Encoding.Default.GetString(buffer, 0, currentRead)
        
                        builder.Append(chars)
                        returnTextBuilder.Append(chars)
        
                        collected = builder.ToString()
                        foundIndex = collected.IndexOf(UntilText)
        
                        If (foundIndex >= 0) Then
                            returnText = returnTextBuilder.ToString()
        
                            Dim indexOfSep As Integer = returnText.IndexOf(UntilText)
                            Dim cutLength As Integer = returnText.Length - indexOfSep
        
                            returnText = returnText.Remove(indexOfSep, cutLength)
        
                            builder.Remove(0, foundIndex + UntilText.Length)
        
                            If (cutLength > UntilText.Length) Then
                                Stream.Position = Stream.Position - (cutLength - UntilText.Length)
                            End If
        
                            Return returnText
                        ElseIf (Not collected.Contains(UntilText.First())) Then
                            builder.Length = 0
                        End If
                    Loop
        
                    Return String.Empty
            End Function
        

        C#

        public static string ReadUntil(System.IO.FileStream Stream, string UntilText)
        {
            System.Text.StringBuilder builder = new System.Text.StringBuilder();
            System.Text.StringBuilder returnTextBuilder = new System.Text.StringBuilder();
            string returnText = string.Empty;
            int size = System.Convert.ToInt32(UntilText.Length / (double)2) - 1;
            byte[] buffer = new byte[size + 1];
            int currentRead = -1;
        
            while (currentRead != 0)
            {
                string collected = null;
                string chars = null;
                int foundIndex = -1;
        
                currentRead = Stream.Read(buffer, 0, buffer.Length);
                chars = System.Text.Encoding.Default.GetString(buffer, 0, currentRead);
        
                builder.Append(chars);
                returnTextBuilder.Append(chars);
        
                collected = builder.ToString();
                foundIndex = collected.IndexOf(UntilText);
        
                if ((foundIndex >= 0))
                {
                    returnText = returnTextBuilder.ToString();
        
                    int indexOfSep = returnText.IndexOf(UntilText);
                    int cutLength = returnText.Length - indexOfSep;
        
                    returnText = returnText.Remove(indexOfSep, cutLength);
        
                    builder.Remove(0, foundIndex + UntilText.Length);
        
                    if ((cutLength > UntilText.Length))
                        Stream.Position = Stream.Position - (cutLength - UntilText.Length);
        
                    return returnText;
                }
                else if ((!collected.Contains(UntilText.First())))
                    builder.Length = 0;
            }
        
            return string.Empty;
        }
        

        【讨论】:

        • 这看起来不错,但是如何从特定位置开始呢?
        猜你喜欢
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 2017-04-27
        • 1970-01-01
        • 2018-10-23
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        相关资源
        最近更新 更多