【问题标题】:C# how to split text file in multi filesC#如何将文本文件拆分为多个文件
【发布时间】:2011-09-02 22:20:43
【问题描述】:

如何将 1000 行的单个文本文件拆分为多个较小的文件,例如每个 300 行?请记住,原始文件的行数可能多于或少于一千行。

file1.txt 300 lines -> rest
file2.txt 300 lines -> rest
file3.txt 300 lines -> rest
file4.txt 100 lines 

我尝试了以下方法,但它不起作用。

int counter = 0;
string line;

string lineoutput = (current_dir + "\\" + DateTime.Now.ToString("HHmmss") + ".txt");

System.IO.StreamReader inputfile;

inputfile = new System.IO.StreamReader(new_path);
while ((line = inputfile.ReadLine()) != null)
{
    System.IO.StreamWriter file = new System.IO.StreamWriter(current_dir + "\\" + DateTime.Now.ToString("HHmmss") + ".txt", true);

    string _replaceBackspace = ReplaceBackspace(read_file.ReadLine().ToLower());

    using (StreamWriter writer = new StreamWriter(lineoutput, true))
    {
        if (counter == 5000)
        {
            counter = 0;
            lineoutput = (current_dir + "\\" + DateTime.Now.ToString("HHmmss") + ".txt");
        }
        writer.WriteLine(line.ToLower());
    }
    counter++;
}

【问题讨论】:

  • 这就是我的示例所做的。这就是原因:“if (reader.EndOfStream) break;”

标签: c# file-io streamreader


【解决方案1】:
string baseName = current_dir + "\\" + DateTime.Now.ToString("HHmmss") + ".";

StreamWriter writer = null;
try
{
    using (StreamReader inputfile = new System.IO.StreamReader(new_path))
    {
        int count = 0;
        string line;
        while ((line = inputfile.ReadLine()) != null)
        {

            if (writer == null || count > 300)
            {
                if (writer != null)
                {
                    writer.Close();
                    writer = null;
                }

                writer = new System.IO.StreamWriter(baseName + count.ToString() + ".txt", true);

                count = 0;
            }

            writer.WriteLine(line.ToLower());

            ++count;
        }
    }
}
finally
{
    if (writer != null)
        writer.Close();
}

【讨论】:

  • 我用这种方式尝试了 1 gb 文件,将文件拆分为 3 个小文件大约需要 15 分钟
【解决方案2】:

最简单的情况:

        var reader = File.OpenText(infile);
        string outFileName = "file{0}.txt";
        int outFileNumber = 1;
        const int MAX_LINES = 300;
        while (!reader.EndOfStream)
        {
            var writer = File.CreateText(string.Format(outFileName, outFileNumber++));
            for (int idx = 0; idx < MAX_LINES; idx++)
            {
                writer.WriteLine(reader.ReadLine());
                if (reader.EndOfStream) break;
            }
            writer.Close();
        }
        reader.Close();

【讨论】:

    【解决方案3】:

    循环File.ReadLines(path)并将每一行写入StreamWriter

    保留一个计数器,每次到达300 时,关闭StreamWriter 并打开一个新的。

    【讨论】:

      【解决方案4】:

      除了 SLaks 答案,您还可以使用 System.Linq 中的扩展方法 SkipTake 来完成此操作

      string[] ss = File.ReadAllLines(@"path to the file");
      
      int cycle = 1;
      int chunksize = 300;
      
      var chunk = ss.Take(chunksize);
      var rem = ss.Skip(chunksize);
      
      while (chunk.Take(1).Count() > 0)
      {
          string filename = "file" + cycle.ToString() + ".txt";
          using (StreamWriter sw = new StreamWriter(filename))
          {
              foreach(string line in chunk)
              {
                  sw.WriteLine(line);
              }
          }
          chunk = rem.Take(chunksize);
          rem = rem.Skip(chunksize);
          cycle++;
      }
      

      【讨论】:

      • 如果文件很大(以 GB 为单位),那么这将抛出内存不足异常
      【解决方案5】:

      根据 bigtbl 的回答,我补充说,对于生成一系列 CSV 的情况,将第一行保留为每个文件的标题。 MAX_LINES 包含总计数的标题行,这就是start_idx 的原因。

      public static void SplitFil(int rows, string inputFile) {
            int outFileNumber = 1;      
            const int MAX_LINES = 50000;      
            string header = "";
            if (GetFileSize(inputFile) > MAX_LINES) {
              var reader = File.OpenText(inputFile);               
              while (!reader.EndOfStream)
              {
                var start_idx = 0;          
                var writer = File.CreateText($"filename_{outFileNumber}.csv");
                if (outFileNumber > 1) {
                  writer.WriteLine(header);
                  start_idx = 1;
                }            
                for (int idx = start_idx; idx < MAX_LINES; idx++)
                { 
                  var row = reader.ReadLine();
                  if (idx == 0 && outFileNumber == 1) header = row;
                  writer.WriteLine(row);
                  if (reader.EndOfStream) break;
                }
                writer.Close();
                outFileNumber++;
              }
              reader.Close();
            }
          }
      

      【讨论】:

        【解决方案6】:

        完整程序:

        using System;
        using System.Collections.Generic;
        using System.IO;
        using System.Linq;
        using System.Text;
        using System.Threading.Tasks;
        
        namespace SplitTexTfileIntoMultiplefiles
        {
            class Program
            {
                static void Main(string[] args)
                {
                    string infile = @"C:\MyProj\file.sql";
                    var reader = File.OpenText(infile);            
                    int outFileNumber = 1;
                    Console.WriteLine("Wait...");
                    const int MAX_LINES = 20000;
                    while (!reader.EndOfStream)
                    {
                        string outfname = Path.GetDirectoryName(infile) + "\\" + Path.GetFileNameWithoutExtension(infile) + outFileNumber.ToString ("D4") + Path.GetExtension(infile);
                        Console.WriteLine(outfname);
                        var writer = File.CreateText(outfname);
                        for (int idx = 0; idx < MAX_LINES; idx++)
                        {
                            writer.WriteLine(reader.ReadLine());
                            if (reader.EndOfStream) break;
                        }
                        writer.Close();
                        outFileNumber++;
                    }
                    reader.Close();
                    Console.WriteLine("Done.");
                    Console.ReadKey();
                }
            }
        }
        

        【讨论】:

          猜你喜欢
          • 1970-01-01
          • 1970-01-01
          • 2013-04-22
          • 1970-01-01
          • 1970-01-01
          • 1970-01-01
          • 1970-01-01
          • 1970-01-01
          • 2016-06-25
          相关资源
          最近更新 更多