将多个空格从文本文件拆分为数组答案

【问题标题】：split multiple whitespaces from text file to array将多个空格从文本文件拆分为数组
【发布时间】：2020-12-20 08:56:23
【问题描述】：

我有一个文本文件，需要将所有 7 个元素（包括空元素）解析为数组以进行进一步处理。但是，除了空格之外，没有唯一的分隔符可供使用，并且一些数据/值将带有空格。每个“数据样本”的示例和一些块将具有空条目。我怎样才能做到这一点？

我的最终结果将与下面类似：

Array[0]:123456789
Array[1]:HLTX
Array[2]:5
Array[3]:BT5Q02
Array[4]:4SV
Array[5]:D8041
Array[6]:LIANG LIN

我的上述函数的代码现在如下所示，它将省略空值。这可能会遗漏一些所需的数据。

string[] splitlinecontent = line.Split(new char[] { ' ' }, StringSplitOptions.RemoveEmptyEntries);
var OrderNum = splitlinecontent[0];
var OrderType = splitlinecontent[1];
int OrderQTY = int.Parse(splitlinecontent[2]);
var OrderSINumInRpt = splitlinecontent[3];
var OrderHoldMod = splitlinecontent[5];
var SalesPerson = splitlinecontent[6];

【问题讨论】：

欢迎堆栈溢出。请将示例数据作为文本而不是图像发布。所以它可以复制发布。此外，我建议使用 csv 阅读器，您可以将其作为 nuget 包找到。使用由已经解决您的问题的人开发的专业工具。不要重新发明轮子
将行分成子串并修剪尾随空白。
这是不如何写记录文件的完美例子
因为你有表格的标题，所以你知道表格的所有单元格在哪里（第一个单元格是从索引 0 到 10（没有第 10 位），第 10 位是开始第二个单元格等）然后您可以按索引拆分每一行，并且您拥有每个单元格的值，请注意，您可以修剪单元格中的所有值，这将只为您提供每个单元格中的文本数据（没有空格），如果它是空的（只是空格），它将是空的。
@AdrianoRepetti，非常感谢！我想这已经解决了我的问题！作为一名计算机科学专业的毕业生，工作了 10 多年，从未接触过真正的编程，这很有趣……现在我找到了进入这个领域的方法……很棒的东西！

标签： c# arrays split

【解决方案1】：

我认为这些文件的最佳实践是使用 Microsoft.VisualBasic.FileIO 的 TextFieldParser；

using (var parser = new TextFieldParser(fileName))
{
    parser.TextFieldType = FieldType.FixedWidth;
    parser.SetFieldWidths(3, 7, 10, 13, 8, 6, 1, 7, -1);

    while (!parser.EndOfData)
    {
        var fields = parser.ReadFields();

但我想自己编写代码并不难。

【讨论】：

这是绝对要走的路。不要让里面的“VisualBasic”把你扔了。
值得一试。只是该文件上数据的宽度或长度不在我的控制范围内。今天它可能是第一列的 10 中的 MAX 长度，明天也可能是 +1，具体取决于订单号的运行范围。无论如何，欢迎所有建议：D

【解决方案2】：

根据您的示例数据的屏幕截图，您的列具有十个字符的固定字符大小。您现在可以简单地逐行读取示例数据，然后按此固定大小拆分行。

public static List<List<string>> GetRecords(string path, bool hasColHeader, int colLength, int colCount){
    //Result will be stored in lists
    List<List<string>> result = new List<List<string>>();

    //Get the sample file
    string[] records = File.ReadAllLines(path,Encoding.UTF8);

    //Go for each line through the data from sample file 
    for(int n = 0; n<records.Length;n++){
        //create new list for this line
        result.Add(new List<string>());

        //here you can do something with headers. for simplification i do nothing with them and continue with next line.
        if(n==0 && hasColHeader){
            continue;
        }

        //go for each column (colCount specifies the count of columns)
        for(int i = 0; i< colCount ;i++){
            
            //if the length of the line is not devisible by colLength, you have to put some spaces to match the columns size
            //not the best way to do this but this is not the major point of this question
            if(records[n].Length % colLength != 0){
                int charsToAdd = (colLength * colCount) - records[n].Length;
                string spaces = "";
                for(int s = 0; s< charsToAdd; s++){
                    spaces += " ";
                }
                records[n] += spaces;
            }

            //add the result to the currently created list
            result[n].Add(records[n].Substring(i*colLength,colLength).Trim());
        }
    }

    return result;
}

您可以像这样使用此代码：

static void Main(string[] args)
{
     List<List<String>> list = GetRecords(@"C:\temp\DataSample.txt",true, 10, 7);
}

列表中的数据如下所示：

List[0]:List[0]:123456789
List[0]:List[1]:HLTX
List[0]:List[2]:5
List[0]:List[3]:BT5Q02
List[0]:List[4]:4SV
List[0]:List[5]:D8041
List[0]:List[6]:LIANG LIN
List[1]:List[0]:3835443
List[1]:List[1]:HLTX
List[1]:List[2]:1
...

这里你可以自己优化两件事。

通过标题之间的字符计算列的大小。列大小始终是列标题的开始和下一个列标题的开始。这两点之间的字符数将是列的大小。
找到更好的方法来获取最后一列！ :D 我不认为我所做的是好的。有更好的方法来做到这一点。

【讨论】：

您的建议可能对我有用，只是实际数据集不包含列标题。该文件是另一个应用程序输出的结果，我的应用程序将只是一个馈送器，用于馈送该文件的信息并重新处理它以供其他下游应用程序使用。