在 C# 中读取 txt 文件的最快方法答案

【问题标题】：Fastest way of reading txt files in C#在 C# 中读取 txt 文件的最快方法
【发布时间】：2014-01-05 14:09:36
【问题描述】：

我正在处理一个项目，我有点困惑。我从老师那里得到了一些 txt 文件（来自his site 文件：wt40.txt、wt50.txt、wt100.txt）。

每个文件结构看起来都相似：

26    24    79    46    32    35    73    74    14    67    86    46    78    40    29    94    64    27    90    55
35    52    36    69    85    95    14    78    37    86    44    28    39    12    30    68    70     9    49    50
 1    10     9    10    10     4     3     2    10     3     7     3     1     3    10     4     7     7     4     7
 5     3     5     4     9     5     2     8    10     4     7     4     9     5     7     7     5    10     1     3

每个数字都有 6 个字符，但不是前导零，而是空格
每行有 20 个数字

文件 wt40.txt 应该读作：前两行到第一个列表，接下来的两行到下一个列表，第三对行到第三个列表。下一行应该再次成对放置到这些列表中。

在 C++ 中，我以这种简单的方式进行操作：

for(int ins=0; ins<125; ins++) //125 instances in file
{
    for(int i=0; i<N; i++)  file>>tasks[i].p; //N elements at two first lines
    for(int i=0; i<N; i++)  file>>tasks[i].w;
    for(int i=0; i<N; i++)  file>>tasks[i].d;
    tasks[i].putToLists();
}

但是当我在 C# 中编写此代码时，我必须打开 StreamReader，读取每一行，用正则表达式拆分，将它们转换为 int 并添加到列表中。这是很多循环。我无法读取每 6 个字符并将它们添加到三个循环中，因为这些文本文件已经弄乱了行尾字符 - 有时它只是 '\n' 有时更多。

没有更简单的方法吗？

【问题讨论】：

调查 File.ReadAlLines 和 String.Split。此外，在 C# 问题中，我们期望 C# 代码（尝试的解决方案），而不是 C++。
另外，将您的数据文件的摘录作为问题的一部分发布。
一个与问题语义相关的小细节 - 阅读始终是“相同” (tm) 速度 - 它超出了您的控制范围。但是，它正在处理行中的数据，这会减慢读取速度。

标签： c# regex file-io

【解决方案1】：

本质上是一个 20 x n 的 6 位（字符）数字表，前导空格。

26    24    79    46    32    35    73    74    14    67    86    46    78    40    29    94    64    27    90    55
35    52    36    69    85    95    14    78    37    86    44    28    39    12    30    68    70     9    49    50
 1    10     9    10    10     4     3     2    10     3     7     3     1     3    10     4     7     7     4     7
 5     3     5     4     9     5     2     8    10     4     7     4     9     5     7     7     5    10     1     3

最后一句没看懂：

文件 wt40.txt 应该读作：前两行到第一个列表，下一个两行到下一个列表，第三对行到第三个列表。下一个行再次应成对放置到这些列表中。

假设您想获取前 6 行并创建 3 个列表，每个列表有 2 行，您可以这样做：

它急于将所有内容读入内存，然后执行它的工作。

const int maxNumberDigitLength = 6;
const int rowLengthInChars = maxNumberDigitLength * 20;
const int totalNumberOfCharsToRead = rowLengthInChars * maxNumberDigitLength;

char[] buffer = new char[totalNumberOfCharsToRead];
using (StreamReader reader = new StreamReader("wt40.txt")
{
    int numberOfCharsRead = reader.Read(buffer, 0, totalNumberOfCharsToRead);
}

// put them in your lists
IEnumerable<char> l1 = buffer.Take(rowLengthInChars);
IEnumerable<char> l2 = buffer.Skip(rowLengthInChars).Take(rowLengthInChars);
IEnumerable<char> l3 = buffer.Skip(rowLengthInChars*2).Take(rowLengthInChars);

// Get the list of strings from the list of chars using non LINQ method.
List<string> list1 = new List<string>();
int i = 0;
StringBuilder sb = new StringBuilder();
foreach(char c in l1)
{
    if(i < maxNumberDigitLength)
    {
        sb.Append(c);
        i++;
    }
    i = 0;
    list1.Add(sb.ToString());
}

// LINQ method
string s = string.Concat(l1);
List<string> list1 = Enumerable
                   .Range(0, s.Length / maxNumberDigitLength)
                   .Select(i => s.Substring(i * maxNumberDigitLength, maxNumberDigitLength))
                   .ToList();     

// Parse to ints using LINQ projection
List<int> numbers1 = list1.Select(int.Parse);
List<int> numbers2 = list2.Select(int.Parse);
List<int> numbers3 = list3.Select(int.Parse);

【讨论】：

为什么在缓冲区上调用ToList()？
你为什么这么频繁地调用它？ :)
@SamLeach, Array 实现IEnumerable<T>，您可以直接对数组使用 LINQ 方法。但是调用 LINQ 方法的结果会是一个IEnumerable<T>，所以你不能把它分配给List<T>。
你是对的。我以为不是。无论如何，他想要一份清单，所以我会留下它。
x => int.Parse(x) 的缩写。查找“列表投影 linq”。

【解决方案2】：

没有更简单的方法吗？

不知道是不是更简单但是只有一个循环和一点LINQ：

List<List<int>> lists = new List<List<int>>();
using (StreamReader reader = new StreamReader("wt40.txt"))
{
    string line;
    int count = 0;
    while ((line = reader.ReadLine()) != null)
    {
        List<int> currentList =
            Regex.Split(line, "\\s")
            .Where(s => !string.IsNullOrWhiteSpace(s))
            .Select(int.Parse).ToList();
        if (currentList.Count > 0) // skip empty lines
        {
            if (count % 2 == 0) // append each second list to the previous one
            {
                lists.Add(currentList);
            }
            else
            {
                lists[count / 2].AddRange(currentList);
            }
        }
        count++;
    }
}

总共有 375 个列表，每个列表包含 40 个数字（至少对于 wt40.txt 输入）。

【讨论】：