【问题标题】:CSV file with different number of records each line (CSV Reader)每行具有不同记录数的 CSV 文件(CSV 阅读器)
【发布时间】:2017-01-21 23:59:51
【问题描述】:

我正在使用这个库:CSV Reader,但问题是.csv 文件的畸形。

例子:

,UDEQPT,,PROMIS,,,,,,,,,,,,,,,,,,,,,,,,,10:20:15,27-Dec-2015,
,UDEQPT,,DELAY,,,,,,,,,am24134_1_drift,am24134.1_drift,229,19,,,3176.00,164.78,,,,,,5,  1.00,1,06:16:16,15-Jun-2016,,,,,,,
,UDEQPT,,DELAY,,,,,,,,,am24134_1_drift,am24134.1_drift,345,25,,,131.68,216.71,,,,,,6,  1.00,1,06:28:23,15-Jun-2016,,,,,,,
,UDEQPT,,DELAY,,,,,,,,,am24134_1_drift,am24134.1_drift,346,25,,,170.18,210.93,,,,,,7,  1.00,1,06:31:18,15-Jun-2016,,,,,,,
,UDEQPT,,DELAY,,,,,,,,,am24134_1_drift,am24134.1_drift,376,27,,,295.83,212.99,,,,,,8,  1.00,1,06:38:47,15-Jun-2016,,,,,,,
,UDEQPT,,ENDLOT,,,,def,def,def,def,,am24134_1_drift,am24134.1_drift,385,27,,,1214.13,213.82,  3.48,  3.11,  1.64, 25.96,1,8,  1.00,1,06:59:46,15-Jun-2016,,4395.91,1465945186,,def,0,1,385,  3.48,357,385, 92.9,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,

列数是 54,所以如果一行中的数据数小于固定的列数,它会给你错误。在上面的示例中,第一行只到索引 30。你如何正确处理这个?

这是我的代码:

using (var path = File.OpenRead(e.FullPath))
            {
                using (var csv = new CachedCsvReader(new StreamReader(path), false))
                {
                    csv.Columns = new List<Column>
                    {
                        new Column { Name = "Delay_Code", Type = typeof(string) },
                        new Column { Name = "PROMIS_Code", Type = typeof(string) },
                        new Column { Name = "Tester_Mode", Type = typeof(string) },
                        new Column { Name = "Event_Name", Type = typeof(string) },
                        new Column { Name = "Test_Program", Type = typeof(string) },
                        new Column { Name = "Temperature", Type = typeof(int?) },
                        new Column { Name = "Lot_Size", Type = typeof(int?) },
                        new Column { Name = "Part_Name", Type = typeof(string) },
                        new Column { Name = "Procedure_Name", Type = typeof(string) },
                        new Column { Name = "Handler_Id", Type = typeof(string) },
                        new Column { Name = "Perf_Board", Type = typeof(string) },
                        new Column { Name = "Sys_Part_Type", Type = typeof(string) },
                        new Column { Name = "Lot_Id", Type = typeof(string) },
                        new Column { Name = "Stage", Type = typeof(string) },
                        new Column { Name = "Parts_Tested", Type = typeof(int?) },
                        new Column { Name = "Parts_Failed", Type = typeof(int?) },
                        new Column { Name = "Reprobes", Type = typeof(int?) },
                        new Column { Name = "Successful_Reprobes", Type = typeof(int?) },
                        new Column { Name = "Delay_Time", Type = typeof(float?) },
                        new Column { Name = "UPH", Type = typeof(float?) },
                        new Column { Name = "Test_Time_Pass", Type = typeof(float?) },
                        new Column { Name = "Test_Time_Fail", Type = typeof(float?) },
                        new Column { Name = "Avg_Index_Time", Type = typeof(float?) },
                        new Column { Name = "Delays_30Sec_Avg", Type = typeof(float?) },
                        new Column { Name = "Delays_30Sec_Count", Type = typeof(int?) },
                        new Column { Name = "Delays_Count", Type = typeof(int?) },
                        new Column { Name = "Avg_Num_Sites", Type = typeof(float?) },
                        new Column { Name = "Active_Sites", Type = typeof(float?) },
                        new Column { Name = "Hour_Min_Sec", Type = typeof(string) },
                        new Column { Name = "Day_Month_Year", Type = typeof(string) },
                        new Column { Name = "User_Name", Type = typeof(string) },
                        new Column { Name = "Delays_Total_Duration", Type = typeof(float?) },
                        new Column { Name = "Duration_Since_Last_End_Lot", Type = typeof(float?) },
                        new Column { Name = "Start_Lot_Time_Data_Entry", Type = typeof(float?) },
                        new Column { Name = "Employee_Id", Type = typeof(string) },
                        new Column { Name = "Valid_Flag", Type = typeof(int?) },
                        new Column { Name = "Sample_Rate", Type = typeof(int?) },
                        new Column { Name = "Handler_Cycles", Type = typeof(int?) },
                        new Column { Name = "Site_1_Only_Pass_Only_Avg_Test_Time", Type = typeof(float?) },
                        new Column { Name = "Site_1_Only_Pass_Only_Count", Type = typeof(int?) },
                        new Column { Name = "Site_1_Count", Type = typeof(int?) },
                        new Column { Name = "Site_1_Yield", Type = typeof(float?) },
                        new Column { Name = "Site_2_Only_Pass_Only_Avg_Test_Time", Type = typeof(float?) },
                        new Column { Name = "Site_2_Only_Pass_Only_Count", Type = typeof(int?) },
                        new Column { Name = "Site_2_Count", Type = typeof(int?) },
                        new Column { Name = "Site_2_Yield", Type = typeof(float?) },
                        new Column { Name = "Site_3_Only_Pass_Only_Avg_Test_Time", Type = typeof(float?) },
                        new Column { Name = "Site_3_Only_Pass_Only_Count", Type = typeof(int?) },
                        new Column { Name = "Site_3_Count", Type = typeof(int?) },
                        new Column { Name = "Site_3_Yield", Type = typeof(float?) },
                        new Column { Name = "Site_4_Only_Pass_Only_Avg_Test_Time", Type = typeof(float?) },
                        new Column { Name = "Site_4_Only_Pass_Only_Count", Type = typeof(int?) },
                        new Column { Name = "Site_4_Count", Type = typeof(int?) },
                        new Column { Name = "Site_4_Yield", Type = typeof(int?) },
                     };

                    csv.MissingFieldAction = MissingFieldAction.ReplaceByNull;
                    csv.SkipEmptyLines = false;
                    csv.DefaultParseErrorAction = ParseErrorAction.RaiseEvent;
                    csv.ParseError += Csv_ParseError;

                    while (csv.ReadNextRecord())
                    {
                        for (int i = 0; i < 54; i++)
                            Console.Write(string.Format(i + ". {0} |", string.IsNullOrEmpty(csv[i]) ? "MISSING" : csv[i]));
                        Console.WriteLine();
                    }

处理缺失字段:

private static void Csv_ParseError(object sender, ParseErrorEventArgs e)
        {
            if (e.Error is MissingFieldCsvException)
            {
                e.Action = ParseErrorAction.AdvanceToNextLine;
            }
        }

【问题讨论】:

  • 怎么处理应该是一个业务逻辑,就是要分情况。有些人会忽略整行,有些人可能会拒绝整个文件。也许您可以告诉我们您想如何处理,看看我们能提供什么帮助
  • 如果你想拥有这样的自定义文件格式,你需要读取行然后自己解析它们。
  • 那么,您当前方法的实际问题是什么?
  • @Alex 当前代码抛出异常,因为第一行没有索引 31。它迭代直到列数为 54。
  • @AllanS.Hansen 我对解析还很陌生,所以我应该如何根据 csv 文件的内容来自定义它。有没有办法使用那个库来做到这一点?

标签: c# csv parsing reader csvhelper


【解决方案1】:

您应该使用 if(csv.count==54) 包装您的 for 循环,以检测该行是否有效而无需进入循环,之后您可以使用专用 if 指定每个字段错误,例如 Delay_Code,一切都取决于你想要的逻辑。

【讨论】:

  • 我希望有一种方法可以获取读者正在阅读的当前行的计数。就像在完成第一行之后,有一种方法可以计算下一行。
  • 我认为你可以像在这个例子中一样使用 csv.Count 和 csv.FieldCount :social.msdn.microsoft.com/Forums/windows/en-US/…
【解决方案2】:

最后,我没有使用任何 CSV 库。我刚刚做了这个Variable Column CSV file processing C#,它就像魅力一样。我还创建了一个 DataTable,然后使用 SQLBulkCopy 将其写入服务器。

【讨论】:

    猜你喜欢
    • 2020-12-24
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2022-06-30
    • 2021-07-23
    • 1970-01-01
    • 2017-06-21
    • 1970-01-01
    相关资源
    最近更新 更多