【问题标题】:reading Excel Open XML is ignoring blank cells阅读 Excel Open XML 忽略空白单元格
【发布时间】:2011-04-19 18:35:11
【问题描述】:

我正在使用accepted solution here 将 Excel 工作表转换为数据表。如果我有“完美”的数据,这很好用,但如果我的数据中间有一个空白单元格,它似乎会在每一列中放置错误的数据。

我认为这是因为在下面的代码中:

row.Descendants<Cell>().Count()

是填充单元格的数量(不是所有列)AND:

GetCellValue(spreadSheetDocument, row.Descendants<Cell>().ElementAt(i));

似乎找到下一个填充单元格(不一定是该索引中的内容)所以如果第一列是空的并且我调用 ElementAt(0),它会返回第二列中的值。

这是完整的解析代码。

DataRow tempRow = dt.NewRow();

for (int i = 0; i < row.Descendants<Cell>().Count(); i++)
{
    tempRow[i] = GetCellValue(spreadSheetDocument, row.Descendants<Cell>().ElementAt(i));
    if (tempRow[i].ToString().IndexOf("Latency issues in") > -1)
    {
        Console.Write(tempRow[i].ToString());
    }
}

【问题讨论】:

  • 请在您提到的同一线程中查看this 答案。它修复了空单元格。

标签: excel datatable openxml openxml-sdk


【解决方案1】:

这是有道理的,因为 Excel 不会为空单元格存储值。如果您使用 Open XML SDK 2.0 Productivity Tool 打开文件并将 XML 向下遍历到单元格级别,您将看到只有包含数据的单元格才会出现在该文件中。

您的选择是在您要遍历的单元格范围内插入空白数据,或者以编程方式找出一个单元格被跳过并适当地调整您的索引。

我在单元格引用 A1 和 C1 中制作了一个带有字符串的示例 excel 文档。然后我在 Open XML Productivity Tool 中打开了 excel 文档,这是存储的 XML:

<x:row r="1" spans="1:3" 
   xmlns:x="http://schemas.openxmlformats.org/spreadsheetml/2006/main">
  <x:c r="A1" t="s">
    <x:v>0</x:v>
  </x:c>
  <x:c r="C1" t="s">
    <x:v>1</x:v>
  </x:c>
</x:row>

在这里,您将看到数据对应于第一行,并且该行只保存了两个单元格的数据。保存的数据对应于 A1 和 C1 并且没有保存具有空值的单元格。

要获得所需的功能,您可以像上面那样遍历单元格,但您需要检查单元格引用的值并确定是否跳过了任何单元格。为此,您需要两个实用函数从单元格引用中获取列名,然后将该列名转换为从零开始的索引:

    private static List<char> Letters = new List<char>() { 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z', ' ' };

    /// <summary>
    /// Given a cell name, parses the specified cell to get the column name.
    /// </summary>
    /// <param name="cellReference">Address of the cell (ie. B2)</param>
    /// <returns>Column Name (ie. B)</returns>
    public static string GetColumnName(string cellReference)
    {
        // Create a regular expression to match the column name portion of the cell name.
        Regex regex = new Regex("[A-Za-z]+");
        Match match = regex.Match(cellReference);

        return match.Value;
    }

    /// <summary>
    /// Given just the column name (no row index), it will return the zero based column index.
    /// Note: This method will only handle columns with a length of up to two (ie. A to Z and AA to ZZ). 
    /// A length of three can be implemented when needed.
    /// </summary>
    /// <param name="columnName">Column Name (ie. A or AB)</param>
    /// <returns>Zero based index if the conversion was successful; otherwise null</returns>
    public static int? GetColumnIndexFromName(string columnName)
    {
        int? columnIndex = null;

        string[] colLetters = Regex.Split(columnName, "([A-Z]+)");
        colLetters = colLetters.Where(s => !string.IsNullOrEmpty(s)).ToArray();

        if (colLetters.Count() <= 2)
        {
            int index = 0;
            foreach (string col in colLetters)
            {
                List<char> col1 = colLetters.ElementAt(index).ToCharArray().ToList();
                int? indexValue = Letters.IndexOf(col1.ElementAt(index));

                if (indexValue != -1)
                {
                    // The first letter of a two digit column needs some extra calculations
                    if (index == 0 && colLetters.Count() == 2)
                    {
                        columnIndex = columnIndex == null ? (indexValue + 1) * 26 : columnIndex + ((indexValue + 1) * 26);
                    }
                    else
                    {
                        columnIndex = columnIndex == null ? indexValue : columnIndex + indexValue;
                    }
                }

                index++;
            }
        }

        return columnIndex;
    }

然后您可以遍历单元格并检查单元格引用与 columnIndex 的比较。如果小于,则将空白数据添加到 tempRow,否则只需读取单元格中包含的值。 (注意:我没有测试下面的代码,但总体思路应该会有所帮助):

DataRow tempRow = dt.NewRow();

int columnIndex = 0;
foreach (Cell cell in row.Descendants<Cell>())
{
   // Gets the column index of the cell with data
   int cellColumnIndex = (int)GetColumnIndexFromName(GetColumnName(cell.CellReference));

   if (columnIndex < cellColumnIndex)
   {
      do
      {
         tempRow[columnIndex] = //Insert blank data here;
         columnIndex++;
      }
      while(columnIndex < cellColumnIndex);
    }
    tempRow[columnIndex] = GetCellValue(spreadSheetDocument, cell);

    if (tempRow[i].ToString().IndexOf("Latency issues in") > -1)
    {
       Console.Write(tempRow[i].ToString());
    }
    columnIndex++;
}

【讨论】:

  • 您是否知道要检测是否存在空白单元格。那是我的问题。我想要一个能够准确读取工作表上的内容(包括空白)的解决方案
  • @ooo - 检测它的唯一方法是子后代列表中不存在单元格引用
  • 查看@amurra 的回答here 以查看“字母”列表的定义。
  • 如果我有列到 AH 怎么办?
  • 有时 cell.CellReference 为空
【解决方案2】:

这是一个 IEnumerable 的实现,它应该可以做你想做的,编译和单元测试。

    ///<summary>returns an empty cell when a blank cell is encountered
    ///</summary>
    public IEnumerator<Cell> GetEnumerator()
    {
        int currentCount = 0;

        // row is a class level variable representing the current
        // DocumentFormat.OpenXml.Spreadsheet.Row
        foreach (DocumentFormat.OpenXml.Spreadsheet.Cell cell in
            row.Descendants<DocumentFormat.OpenXml.Spreadsheet.Cell>())
        {
            string columnName = GetColumnName(cell.CellReference);

            int currentColumnIndex = ConvertColumnNameToNumber(columnName);

            for ( ; currentCount < currentColumnIndex; currentCount++)
            {
                yield return new DocumentFormat.OpenXml.Spreadsheet.Cell();
            }

            yield return cell;
            currentCount++;
        }
    }

以下是它所依赖的函数:

    /// <summary>
    /// Given a cell name, parses the specified cell to get the column name.
    /// </summary>
    /// <param name="cellReference">Address of the cell (ie. B2)</param>
    /// <returns>Column Name (ie. B)</returns>
    public static string GetColumnName(string cellReference)
    {
        // Match the column name portion of the cell name.
        Regex regex = new Regex("[A-Za-z]+");
        Match match = regex.Match(cellReference);

        return match.Value;
    }

    /// <summary>
    /// Given just the column name (no row index),
    /// it will return the zero based column index.
    /// </summary>
    /// <param name="columnName">Column Name (ie. A or AB)</param>
    /// <returns>Zero based index if the conversion was successful</returns>
    /// <exception cref="ArgumentException">thrown if the given string
    /// contains characters other than uppercase letters</exception>
    public static int ConvertColumnNameToNumber(string columnName)
    {
        Regex alpha = new Regex("^[A-Z]+$");
        if (!alpha.IsMatch(columnName)) throw new ArgumentException();

        char[] colLetters = columnName.ToCharArray();
        Array.Reverse(colLetters);

        int convertedValue = 0;
        for (int i = 0; i < colLetters.Length; i++)
        {
            char letter = colLetters[i];
            int current = i == 0 ? letter - 65 : letter - 64; // ASCII 'A' = 65
            convertedValue += current * (int)Math.Pow(26, i);
        }

        return convertedValue;
    }

把它放到一个班级里试试看。

【讨论】:

  • 有人可以告诉我如何明确地实现它吗?谢谢!
  • 一些上下文对于可枚举的例子会更好。
【解决方案3】:

这是Waylon's answer 的略微修改版本,它也依赖于其他答案。它将他的方法封装在一个类中。

我变了

IEnumerator<Cell> GetEnumerator()

IEnumerable<Cell> GetRowCells(Row row)

这是类,你不需要实例化它,它只是一个实用类:

public class SpreedsheetHelper
{
    ///<summary>returns an empty cell when a blank cell is encountered
    ///</summary>
    public static IEnumerable<Cell> GetRowCells(Row row)
    {
        int currentCount = 0;

        foreach (DocumentFormat.OpenXml.Spreadsheet.Cell cell in
            row.Descendants<DocumentFormat.OpenXml.Spreadsheet.Cell>())
        {
            string columnName = GetColumnName(cell.CellReference);

            int currentColumnIndex = ConvertColumnNameToNumber(columnName);

            for (; currentCount < currentColumnIndex; currentCount++)
            {
                yield return new DocumentFormat.OpenXml.Spreadsheet.Cell();
            }

            yield return cell;
            currentCount++;
        }
    }

    /// <summary>
    /// Given a cell name, parses the specified cell to get the column name.
    /// </summary>
    /// <param name="cellReference">Address of the cell (ie. B2)</param>
    /// <returns>Column Name (ie. B)</returns>
    public static string GetColumnName(string cellReference)
    {
        // Match the column name portion of the cell name.
        var regex = new System.Text.RegularExpressions.Regex("[A-Za-z]+");
        var match = regex.Match(cellReference);

        return match.Value;
    }

    /// <summary>
    /// Given just the column name (no row index),
    /// it will return the zero based column index.
    /// </summary>
    /// <param name="columnName">Column Name (ie. A or AB)</param>
    /// <returns>Zero based index if the conversion was successful</returns>
    /// <exception cref="ArgumentException">thrown if the given string
    /// contains characters other than uppercase letters</exception>
    public static int ConvertColumnNameToNumber(string columnName)
    {
        var alpha = new System.Text.RegularExpressions.Regex("^[A-Z]+$");
        if (!alpha.IsMatch(columnName)) throw new ArgumentException();

        char[] colLetters = columnName.ToCharArray();
        Array.Reverse(colLetters);

        int convertedValue = 0;
        for (int i = 0; i < colLetters.Length; i++)
        {
            char letter = colLetters[i];
            int current = i == 0 ? letter - 65 : letter - 64; // ASCII 'A' = 65
            convertedValue += current * (int)Math.Pow(26, i);
        }

        return convertedValue;
    }
}

现在您可以通过这种方式获取所有行的单元格:

// skip the part that retrieves the worksheet sheetData
IEnumerable<Row> rows = sheetData.Descendants<Row>();
foreach(Row row in rows)
{
    IEnumerable<Cell> cells = SpreedsheetHelper.GetRowCells(row);
    foreach (Cell cell in cells)
    {
         // skip part that reads the text according to the cell-type
    }
}

它将包含所有单元格,即使它们是空的。

【讨论】:

    【解决方案4】:

    查看我的实现:

      Row[] rows = worksheet.GetFirstChild<SheetData>()
                    .Elements<Row>()
                    .ToArray();
    
      string[] columnNames = rows.First()
                    .Elements<Cell>()
                    .Select(cell => GetCellValue(cell, document))
                    .ToArray();
    
      HeaderLetters = ExcelHeaderHelper.GetHeaderLetters((uint)columnNames.Count());
    
      if (columnNames.Count() != HeaderLetters.Count())
      {
           throw new ArgumentException("HeaderLetters");
      }
    
      IEnumerable<List<string>> cellValues = GetCellValues(rows.Skip(1), columnNames.Count(), document);
    
    //Here you can enumerate through the cell values, based on the cell index the column names can be retrieved.
    

    HeaderLetters 是使用此类收集的:

        private static class ExcelHeaderHelper
        {
            public static string[] GetHeaderLetters(uint max)
            {
                var result = new List<string>();
                int i = 0;
                var columnPrefix = new Queue<string>();
                string prefix = null;
                int prevRoundNo = 0;
                uint maxPrefix = max / 26;
    
                while (i < max)
                {
                    int roundNo = i / 26;
                    if (prevRoundNo < roundNo)
                    {
                        prefix = columnPrefix.Dequeue();
                        prevRoundNo = roundNo;
                    }
                    string item = prefix + ((char)(65 + (i % 26))).ToString(CultureInfo.InvariantCulture);
                    if (i <= maxPrefix)
                    {
                        columnPrefix.Enqueue(item);
                    }
                    result.Add(item);
                    i++;
                }
                return result.ToArray();
            }
        }
    

    辅助方法是:

        private static IEnumerable<List<string>> GetCellValues(IEnumerable<Row> rows, int columnCount, SpreadsheetDocument document)
        {
            var result = new List<List<string>>();
            foreach (var row in rows)
            {
                List<string> cellValues = new List<string>();
                var actualCells = row.Elements<Cell>().ToArray();
    
                int j = 0;
                for (int i = 0; i < columnCount; i++)
                {
                    if (actualCells.Count() <= j || !actualCells[j].CellReference.ToString().StartsWith(HeaderLetters[i]))
                    {
                        cellValues.Add(null);
                    }
                    else
                    {
                        cellValues.Add(GetCellValue(actualCells[j], document));
                        j++;
                    }
                }
                result.Add(cellValues);
            }
            return result;
        }
    
    
    private static string GetCellValue(Cell cell, SpreadsheetDocument document)
    {
        bool sstIndexedcell = GetCellType(cell);
        return sstIndexedcell
            ? GetSharedStringItemById(document.WorkbookPart, Convert.ToInt32(cell.InnerText))
            : cell.InnerText;
    }
    
    private static bool GetCellType(Cell cell)
    {
        return cell.DataType != null && cell.DataType == CellValues.SharedString;
    }
    
    private static string GetSharedStringItemById(WorkbookPart workbookPart, int id)
    {
        return workbookPart.SharedStringTablePart.SharedStringTable.Elements<SharedStringItem>().ElementAt(id).InnerText;
    }
    

    该解决方案处理共享单元格项目(SST 索引单元格)。

    【讨论】:

      【解决方案5】:

      所有的好例子。这是我正在使用的一个,因为我需要跟踪所有行、单元格、值和标题以进行关联和分析。

      ReadSpreadsheet 方法打开一个 xlxs 文件并遍历每个工作表、行和列。由于这些值存储在引用的字符串表中,因此我还明确地使用每个工作表。还使用了其他类:DSFunction 和 StaticVariables。后者保存经常使用的参数值,例如引用的 'quotdouble' ( quotdouble = "\u0022"; ) 和 'crlf' (crlf = "\u000D" + "\u000A"; )。

      下面包含了相关的DSFunction方法GetIntColIndexForLetter。它为与字母名称(例如(A、B、AA、ADE 等))对应的列索引返回一个整数值。这与参数“ncellcolref”一起使用,以确定是否已跳过任何列,并为每个缺失的列输入空字符串值。

      在临时存储在 List 对象中之前,我还会对值进行一些清理(使用 Replace 方法)。

      随后,我使用列名的哈希表(字典)在不同的工作表中提取值、关联它们、创建标准化值,然后创建一个在我们的产品中使用的对象,然后将其存储为 XML 文件。这些都没有显示,但这就是使用这种方法的原因。

          public static class DSFunction {
      
          /// <summary>
          /// Creates an integer value for a column letter name starting at 1 for 'a'
          /// </summary>
          /// <param name="lettstr">Column name as letters</param>
          /// <returns>int value</returns>
          public static int GetIntColIndexForLetter(string lettstr) {
              string txt = "", txt1="";
              int n1, result = 0, nbeg=-1, nitem=0;
              try {
                  nbeg = (int)("a".ToCharArray()[0]) - 1; //1 based
                  txt = lettstr;
                  if (txt != "") txt = txt.ToLower().Trim();
                  while (txt != "") {
                      if (txt.Length > 1) {
                          txt1 = txt.Substring(0, 1);
                          txt = txt.Substring(1);
                      }
                      else {
                          txt1 = txt;
                          txt = "";
                      }
                      if (!DSFunction.IsNumberString(txt1, "real")) {
                          nitem++;
                          n1 = (int)(txt1.ToCharArray()[0]) - nbeg;
                          result += n1 + (nitem - 1) * 26;
                      }
                      else {
                          break;
                      }
                  }
              }
              catch (Exception ex) {
                  txt = ex.Message;
              }
              return result;
          }
      
      
      }
      
      
          public static class Extractor {
      
          public static string ReadSpreadsheet(string fileUri) {
              string msg = "", txt = "", txt1 = "";
              int i, n1, n2, nrow = -1, ncell = -1, ncellcolref = -1;
              Boolean haveheader = true;
              Dictionary<string, int> hashcolnames = new Dictionary<string, int>();
              List<string> colvalues = new List<string>();
              try {
                  if (!File.Exists(fileUri)) { throw new Exception("file does not exist"); }
                  using (SpreadsheetDocument ssdoc = SpreadsheetDocument.Open(fileUri, true)) {
                      var stringTable = ssdoc.WorkbookPart.GetPartsOfType<SharedStringTablePart>().FirstOrDefault();
                      foreach (Sheet sht in ssdoc.WorkbookPart.Workbook.Descendants<Sheet>()) {
                          nrow = 0;
                          foreach (Row ssrow in ((WorksheetPart)(ssdoc.WorkbookPart.GetPartById(sht.Id))).Worksheet.Descendants<Row>()) {
                              ncell = 0;
                              ncellcolref = 0;
                              nrow++;
                              colvalues.Clear();
                              foreach (Cell sscell in ssrow.Elements<Cell>()) {
                                  ncell++;
                                  n1 = DSFunction.GetIntColIndexForLetter(sscell.CellReference);
                                  for (i = 0; i < (n1 - ncellcolref - 1); i++) {
                                      if (nrow == 1 && haveheader) {
                                          txt1 = "-missing" + (ncellcolref + 1 + i).ToString() + "-";
                                          if (!hashcolnames.TryGetValue(txt1, out n2)) {
                                              hashcolnames.Add(txt1, ncell - 1);
                                          }
                                      }
                                      else {
                                          colvalues.Add("");
                                      }
                                  }
                                  ncellcolref = n1;
                                  if (sscell.DataType != null) {
                                      if (sscell.DataType.Value == CellValues.SharedString && stringTable != null) {
                                          txt = stringTable.SharedStringTable.ElementAt(int.Parse(sscell.InnerText)).InnerText;
                                      }
                                      else if (sscell.DataType.Value == CellValues.String) {
                                          txt = sscell.InnerText;
                                      }
                                      else txt = sscell.InnerText.ToString();
                                  }
                                  else txt = sscell.InnerText;
                                  if (txt != "") txt1 = txt.ToLower().Trim(); else txt1 = "";
                                  if (nrow == 1 && haveheader) {
                                      txt1 = txt1.Replace(" ", "");
                                      if (txt1 == "table/viewname") txt1 = "tablename";
                                      else if (txt1 == "schemaownername") txt1 = "schemaowner";
                                      else if (txt1 == "subjectareaname") txt1 = "subjectarea";
                                      else if (txt1.StartsWith("column")) {
                                          txt1 = txt1.Substring("column".Length);
                                      }
                                      if (!hashcolnames.TryGetValue(txt1, out n1)) {
                                          hashcolnames.Add(txt1, ncell - 1);
                                      }
                                  }
                                  else {
                                      txt = txt.Replace(((char)8220).ToString(), "'");  //special "
                                      txt = txt.Replace(((char)8221).ToString(), "'"); //special "
                                      txt = txt.Replace(StaticVariables.quotdouble, "'");
                                      txt = txt.Replace(StaticVariables.crlf, " ");
                                      txt = txt.Replace("  ", " ");
                                      txt = txt.Replace("<", "");
                                      txt = txt.Replace(">", "");
                                      colvalues.Add(txt);
                                  }
                              }
                          }
                      }
                  }
              }
              catch (Exception ex) {
                  msg = "notok:" + ex.Message;
              }
              return msg;
          }
      
      
      
      
      
      }
      

      【讨论】:

        【解决方案6】:

        字母代码是 base 26 编码,因此应该可以将其转换为偏移量。

        // Converts letter code (i.e. AA) to an offset
        public int offset( string code)
        {
            var offset = 0;
            var byte_array = Encoding.ASCII.GetBytes( code ).Reverse().ToArray();
            for( var i = 0; i < byte_array.Length; i++ )
            {
                offset += (byte_array[i] - 65 + 1) * Convert.ToInt32(Math.Pow(26.0, Convert.ToDouble(i)));
            }
            return offset - 1;
        }
        

        【讨论】:

          【解决方案7】:

          您可以使用此函数从通过标题索引的行中提取单元格:

          public static Cell GetCellFromRow(Row r ,int headerIdx) {
                  string cellname = GetNthColumnName(headerIdx) + r.RowIndex.ToString();
                  IEnumerable<Cell> cells = r.Elements<Cell>().Where(x=> x.CellReference == cellname);
                  if (cells.Count() > 0)
                  {
                      return cells.First();
                  }
                  else {
                      return null;
                  }
          }
          public static string GetNthColumnName(int n)
              {
                  string name = "";
                  while (n > 0)
                  {
                      n--;
                      name = (char)('A' + n % 26) + name;
                      n /= 26;
                  }
                  return name;
              }
          

          【讨论】:

            【解决方案8】:

            好的,我不是这方面的专家,但其他答案对我来说似乎有点过头了,所以这是我的解决方案:

            // Loop through each row in the spreadsheet, skipping the header row
            foreach (var row in sheetData.Elements<Row>().Skip(1))
            {
                var i = 0;
                string[] letters = new string[15] {"A", "B", "C", "D", "E", "F", "G", "H", "I", "J", "K", "L", "M", "N", "O" };
            
                List<String> cellsList = new List<string>();
                foreach (var cell in row.Elements<Cell>().ToArray())
                {
                    while (cell.CellReference.ToString()[0] != Convert.ToChar(letters[i]))
                    {//accounts for multiple consecutive blank cells
                        cellsList.Add("");
                        i++;
                    }
                    cellsList.Add(cell.CellValue.Text);
                    i++;
                }
            
                string[] cells = cellsList.ToArray();
            
                foreach(var cell in cellsList)
                {
                    //display contents of cell, depending on the datatype you may need to call each of the cells manually
                }
            }
            

            希望有人觉得这很有用!

            【讨论】:

              【解决方案9】:

              很抱歉发布这个问题的另一个答案,这是我使用的代码。

              如果工作表顶部有空白行,我会遇到 OpenXML 无法正常工作的问题。它有时只会返回一个包含 0 行和 0 列的 DataTable。下面的代码可以处理这个问题,以及所有其他工作表。

              以下是您调用我的代码的方式。只需传入文件名和要读入的工作表的名称:

              DataTable dt = OpenXMLHelper.ExcelWorksheetToDataTable("C:\\SQL Server\\SomeExcelFile.xlsx", "Mikes Worksheet");
              

              这是代码本身:

                  public class OpenXMLHelper
                  {
                      //  A helper function to open an Excel file using OpenXML, and return a DataTable containing all the data from one
                      //  of the worksheets.
                      //
                      //  We've had lots of problems reading in Excel data using OLEDB (eg the ACE drivers no longer being present on new servers,
                      //  OLEDB not working due to security issues, and blatantly ignoring blank rows at the top of worksheets), so this is a more 
                      //  stable method of reading in the data.
                      //
                      public static DataTable ExcelWorksheetToDataTable(string pathFilename, string worksheetName)
                      {
                          DataTable dt = new DataTable(worksheetName);
              
                          using (SpreadsheetDocument document = SpreadsheetDocument.Open(pathFilename, false))
                          {
                              // Find the sheet with the supplied name, and then use that 
                              // Sheet object to retrieve a reference to the first worksheet.
                              Sheet theSheet = document.WorkbookPart.Workbook.Descendants<Sheet>().Where(s => s.Name == worksheetName).FirstOrDefault();
                              if (theSheet == null)
                                  throw new Exception("Couldn't find the worksheet: " + worksheetName);
              
                              // Retrieve a reference to the worksheet part.
                              WorksheetPart wsPart = (WorksheetPart)(document.WorkbookPart.GetPartById(theSheet.Id));
                              Worksheet workSheet = wsPart.Worksheet;
              
                              string dimensions = workSheet.SheetDimension.Reference.InnerText;       //  Get the dimensions of this worksheet, eg "B2:F4"
              
                              int numOfColumns = 0;
                              int numOfRows = 0;
                              CalculateDataTableSize(dimensions, ref numOfColumns, ref numOfRows);
                              System.Diagnostics.Trace.WriteLine(string.Format("The worksheet \"{0}\" has dimensions \"{1}\", so we need a DataTable of size {2}x{3}.", worksheetName, dimensions, numOfColumns, numOfRows));
              
                              SheetData sheetData = workSheet.GetFirstChild<SheetData>();
                              IEnumerable<Row> rows = sheetData.Descendants<Row>();
              
                              string[,] cellValues = new string[numOfColumns, numOfRows];
              
                              int colInx = 0;
                              int rowInx = 0;
                              string value = "";
                              SharedStringTablePart stringTablePart = document.WorkbookPart.SharedStringTablePart;
              
                              //  Iterate through each row of OpenXML data, and store each cell's value in the appropriate slot in our [,] string array.
                              foreach (Row row in rows)
                              {
                                  for (int i = 0; i < row.Descendants<Cell>().Count(); i++)
                                  {
                                      //  *DON'T* assume there's going to be one XML element for each column in each row...
                                      Cell cell = row.Descendants<Cell>().ElementAt(i);
                                      if (cell.CellValue == null || cell.CellReference == null)
                                          continue;                       //  eg when an Excel cell contains a blank string
              
                                      //  Convert this Excel cell's CellAddress into a 0-based offset into our array (eg "G13" -> [6, 12])
                                      colInx = GetColumnIndexByName(cell.CellReference);             //  eg "C" -> 2  (0-based)
                                      rowInx = GetRowIndexFromCellAddress(cell.CellReference)-1;     //  Needs to be 0-based
              
                                      //  Fetch the value in this cell
                                      value = cell.CellValue.InnerXml;
                                      if (cell.DataType != null && cell.DataType.Value == CellValues.SharedString)
                                      {
                                          value = stringTablePart.SharedStringTable.ChildElements[Int32.Parse(value)].InnerText;
                                      }
              
                                      cellValues[colInx, rowInx] = value;
                                  }
                              }
              
                              //  Copy the array of strings into a DataTable.
                              //  We don't (currently) make any attempt to work out which columns should be numeric, rather than string.
                              for (int col = 0; col < numOfColumns; col++)
                                  dt.Columns.Add("Column_" + col.ToString());
              
                              for (int row = 0; row < numOfRows; row++)
                              {
                                  DataRow dataRow = dt.NewRow();
                                  for (int col = 0; col < numOfColumns; col++)
                                  {
                                      dataRow.SetField(col, cellValues[col, row]);
                                  }
                                  dt.Rows.Add(dataRow);
                              }
              
              #if DEBUG
                              //  Write out the contents of our DataTable to the Output window (for debugging)
                              string str = "";
                              for (rowInx = 0; rowInx < maxNumOfRows; rowInx++)
                              {
                                  for (colInx = 0; colInx < maxNumOfColumns; colInx++)
                                  {
                                      object val = dt.Rows[rowInx].ItemArray[colInx];
                                      str += (val == null) ? "" : val.ToString();
                                      str += "\t";
                                  }
                                  str += "\n";
                              }
                              System.Diagnostics.Trace.WriteLine(str);
              #endif
                              return dt;
                          }
                      }
              
                      private static void CalculateDataTableSize(string dimensions, ref int numOfColumns, ref int numOfRows)
                      {
                          //  How many columns & rows of data does this Worksheet contain ?  
                          //  We'll read in the Dimensions string from the Excel file, and calculate the size based on that.
                          //      eg "B1:F4" -> we'll need 6 columns and 4 rows.
                          //
                          //  (We deliberately ignore the top-left cell address, and just use the bottom-right cell address.)
                          try
                          {
                              string[] parts = dimensions.Split(':');     // eg "B1:F4" 
                              if (parts.Length != 2)
                                  throw new Exception("Couldn't find exactly *two* CellAddresses in the dimension");
              
                              numOfColumns = 1 + GetColumnIndexByName(parts[1]);     //  A=1, B=2, C=3  (1-based value), so F4 would return 6 columns
                              numOfRows = GetRowIndexFromCellAddress(parts[1]);
                          }
                          catch
                          {
                              throw new Exception("Could not calculate maximum DataTable size from the worksheet dimension: " + dimensions);
                          }
                      }
              
                      public static int GetRowIndexFromCellAddress(string cellAddress)
                      {
                          //  Convert an Excel CellReference column into a 1-based row index
                          //  eg "D42"  ->  42
                          //     "F123" ->  123
                          string rowNumber = System.Text.RegularExpressions.Regex.Replace(cellAddress, "[^0-9 _]", "");
                          return int.Parse(rowNumber);
                      }
              
                      public static int GetColumnIndexByName(string cellAddress)
                      {
                          //  Convert an Excel CellReference column into a 0-based column index
                          //  eg "D42" ->  3
                          //     "F123" -> 5
                          var columnName = System.Text.RegularExpressions.Regex.Replace(cellAddress, "[^A-Z_]", "");
                          int number = 0, pow = 1;
                          for (int i = columnName.Length - 1; i >= 0; i--)
                          {
                              number += (columnName[i] - 'A' + 1) * pow;
                              pow *= 26;
                          }
                          return number - 1;
                      }
                  }
              

              【讨论】:

                【解决方案10】:

                我无法抗拒优化 Amurra 答案中的子例程,以消除对正则表达式的需求。

                实际上并不需要第一个函数,因为第二个函数可以接受单元格引用 (C3) 或列名 (C)(但仍然是一个很好的辅助函数)。索引也是基于 1 的(只是因为我们的实现使用基于 1 的行以在视觉上与 Excel 匹配)。

                    /// <summary>
                    /// Given a cell name, return the cell column name.
                    /// </summary>
                    /// <param name="cellReference">Address of the cell (ie. B2)</param>
                    /// <returns>Column Name (ie. B)</returns>
                    /// <exception cref="ArgumentOutOfRangeException">cellReference</exception>
                    public static string GetColumnName(string cellReference)
                    {
                        // Advance from L to R until a number, then return 0 through previous position
                        //
                        for (int lastCharPos = 0; lastCharPos <= 3; lastCharPos++)
                            if (Char.IsNumber(cellReference[lastCharPos]))
                                return cellReference.Substring(0, lastCharPos);
                
                        throw new ArgumentOutOfRangeException("cellReference");
                    }
                
                    /// <summary>
                    /// Return one-based column index given a cell name or column name
                    /// </summary>
                    /// <param name="columnNameOrCellReference">Column Name (ie. A, AB3, or AB44)</param>
                    /// <returns>One based index if the conversion was successful; otherwise null</returns>
                    public static int GetColumnIndexFromName(string columnNameOrCellReference)
                    {
                        int columnIndex = 0;            
                        int factor = 1;
                        for (int pos = columnNameOrCellReference.Length - 1; pos >= 0; pos--)   // R to L
                        {
                            if (Char.IsLetter(columnNameOrCellReference[pos]))  // for letters (columnName)
                            {
                                columnIndex += factor * ((columnNameOrCellReference[pos] - 'A') + 1);
                                factor *= 26;
                            }
                        }
                        return columnIndex;
                    }
                

                【讨论】:

                  【解决方案11】:

                  添加了另一个实现,这次是预先知道列数:

                          /// <summary>
                          /// Gets a list cells that are padded with empty cells where necessary.
                          /// </summary>
                          /// <param name="numberOfColumns">The number of columns expected.</param>
                          /// <param name="cells">The cells.</param>
                          /// <returns>List of padded cells</returns>
                          private static IList<Cell> GetPaddedCells(int numberOfColumns, IList<Cell> cells)
                          {
                              // Only perform the padding operation if existing column count is less than required
                              if (cells.Count < numberOfColumns - 1)
                              {
                                  IList<Cell> padded = new List<Cell>();
                                  int cellIndex = 0;
                  
                                  for (int paddedIndex = 0; paddedIndex < numberOfColumns; paddedIndex++)
                                  {
                                      if (cellIndex < cells.Count)
                                      {
                                          // Grab column reference (ignore row) <seealso cref="https://stackoverflow.com/a/7316298/674776"/>
                                          string columnReference = new string(cells[cellIndex].CellReference.ToString().Where(char.IsLetter).ToArray());
                  
                                          // Convert reference to index <seealso cref="https://stackoverflow.com/a/848552/674776"/>
                                          int indexOfReference = columnReference.ToUpper().Aggregate(0, (column, letter) => (26 * column) + letter - 'A' + 1) - 1;
                  
                                          // Add padding cells where current cell index is less than required
                                          while (indexOfReference > paddedIndex)
                                          {
                                              padded.Add(new Cell());
                                              paddedIndex++;
                                          }
                  
                                          padded.Add(cells[cellIndex++]);
                                      }
                                      else
                                      {
                                          // Add padding cells when passed existing cells
                                          padded.Add(new Cell());
                                      }
                                  }
                  
                                  return padded;
                              }
                              else
                              {
                                  return cells;
                              }
                          }
                  

                  调用方式:

                  IList<Cell> cells = GetPaddedCells(38, row.Descendants<Cell>().ToList());
                  

                  其中 38 是所需的列数。

                  【讨论】:

                    【解决方案12】:

                    为了读取空白单元格,我在行读取器外部分配了一个名为“CN”的变量,在 while 循环中,我正在检查列索引是否大于我的变量,因为它在每个单元格读取后递增.如果这不匹配,我将用我想要的值填充我的列。这是我用来将空白单元格赶上我尊重的列值的技巧。代码如下:

                    public static DataTable ReadIntoDatatableFromExcel(string newFilePath)
                            {
                                /*Creating a table with 20 columns*/
                                var dt = CreateProviderRvenueSharingTable();
                    
                                try
                                {
                                    /*using stream so that if excel file is in another process then it can read without error*/
                                    using (Stream stream = new FileStream(newFilePath, FileMode.Open, FileAccess.Read, FileShare.ReadWrite))
                                    {
                                        using (SpreadsheetDocument spreadsheetDocument = SpreadsheetDocument.Open(stream, false))
                                        {
                                            var workbookPart = spreadsheetDocument.WorkbookPart;
                                            var workbook = workbookPart.Workbook;
                    
                                            /*get only unhide tabs*/
                                            var sheets = workbook.Descendants<Sheet>().Where(e => e.State == null);
                    
                                            foreach (var sheet in sheets)
                                            {
                                                var worksheetPart = (WorksheetPart)workbookPart.GetPartById(sheet.Id);
                    
                                                /*Remove empty sheets*/
                                                List<Row> rows = worksheetPart.Worksheet.Elements<SheetData>().First().Elements<Row>()
                                                    .Where(r => r.InnerText != string.Empty).ToList();
                    
                                                if (rows.Count > 1)
                                                {
                                                    OpenXmlReader reader = OpenXmlReader.Create(worksheetPart);
                    
                                                    int i = 0;
                                                    int BTR = 0;/*Break the reader while empty rows are found*/
                    
                                                    while (reader.Read())
                                                    {
                                                        if (reader.ElementType == typeof(Row))
                                                        {
                                                            /*ignoring first row with headers and check if data is there after header*/
                                                            if (i < 2)
                                                            {
                                                                i++;
                                                                continue;
                                                            }
                    
                                                            reader.ReadFirstChild();
                    
                                                            DataRow row = dt.NewRow();
                    
                                                            int CN = 0;
                    
                                                            if (reader.ElementType == typeof(Cell))
                                                            {
                                                                do
                                                                {
                                                                    Cell c = (Cell)reader.LoadCurrentElement();
                    
                                                                    /*reader skipping blank cells so data is getting worng in datatable's rows according to header*/
                                                                    if (CN != 0)
                                                                    {
                                                                        int cellColumnIndex =
                                                                            ExcelHelper.GetColumnIndexFromName(
                                                                                ExcelHelper.GetColumnName(c.CellReference));
                    
                                                                        if (cellColumnIndex < 20 && CN < cellColumnIndex - 1)
                                                                        {
                                                                            do
                                                                            {
                                                                                row[CN] = string.Empty;
                                                                                CN++;
                                                                            } while (CN < cellColumnIndex - 1);
                                                                        }
                                                                    }
                    
                                                                    /*stopping execution if first cell does not have any value which means empty row*/
                                                                    if (CN == 0 && c.DataType == null && c.CellValue == null)
                                                                    {
                                                                        BTR++;
                                                                        break;
                                                                    }
                    
                                                                    string cellValue = GetCellValue(c, workbookPart);
                                                                    row[CN] = cellValue;
                                                                    CN++;
                    
                                                                    /*if any text exists after T column (index 20) then skip the reader*/
                                                                    if (CN == 20)
                                                                    {
                                                                        break;
                                                                    }
                                                                } while (reader.ReadNextSibling());
                                                            }
                    
                                                            /*reader skipping blank cells so fill the array upto 19 index*/
                                                            while (CN != 0 && CN < 20)
                                                            {
                                                                row[CN] = string.Empty;
                                                                CN++;
                                                            }
                    
                                                            if (CN == 20)
                                                            {
                                                                dt.Rows.Add(row);
                                                            }
                                                        }
                                                        /*escaping empty rows below data filled rows after checking 5 times */
                                                        if (BTR > 5)
                                                            break;
                                                    }
                                                    reader.Close();
                                                }                            
                                            }
                                        }
                                    }
                                }
                                catch (Exception ex)
                                {
                                    throw ex;
                                }
                                return dt;
                            }
                    
                      private static string GetCellValue(Cell c, WorkbookPart workbookPart)
                            {
                                string cellValue = string.Empty;
                                if (c.DataType != null && c.DataType == CellValues.SharedString)
                                {
                                    SharedStringItem ssi =
                                        workbookPart.SharedStringTablePart.SharedStringTable
                                            .Elements<SharedStringItem>()
                                            .ElementAt(int.Parse(c.CellValue.InnerText));
                                    if (ssi.Text != null)
                                    {
                                        cellValue = ssi.Text.Text;
                                    }
                                }
                                else
                                {
                                    if (c.CellValue != null)
                                    {
                                        cellValue = c.CellValue.InnerText;
                                    }
                                }
                                return cellValue;
                            }
                    
                    public static int GetColumnIndexFromName(string columnNameOrCellReference)
                            {
                                int columnIndex = 0;
                                int factor = 1;
                                for (int pos = columnNameOrCellReference.Length - 1; pos >= 0; pos--)   // R to L
                                {
                                    if (Char.IsLetter(columnNameOrCellReference[pos]))  // for letters (columnName)
                                    {
                                        columnIndex += factor * ((columnNameOrCellReference[pos] - 'A') + 1);
                                        factor *= 26;
                                    }
                                }
                                return columnIndex;
                            }
                    
                            public static string GetColumnName(string cellReference)
                            {
                                /* Advance from L to R until a number, then return 0 through previous position*/
                                for (int lastCharPos = 0; lastCharPos <= 3; lastCharPos++)
                                    if (Char.IsNumber(cellReference[lastCharPos]))
                                        return cellReference.Substring(0, lastCharPos);
                    
                                throw new ArgumentOutOfRangeException("cellReference");
                            }
                    

                    代码适用于:

                    1. 此代码读取空白单元格
                    2. 读取完成后跳过空行。
                    3. 按升序从第一个开始阅读工作表
                    4. 如果 excel 文件正被另一个进程使用,OpenXML 仍会读取该文件。

                    【讨论】:

                      【解决方案13】:

                      这是我的解决方案。我发现当缺少的字段位于一行的末尾时,上述方法似乎效果不佳。

                      假设 Excel 工作表中的第一行包含所有列(通过标题),然后获取每行预期的列数(行 == 1)。然后循环遍历数据行(行 > 1)。处理缺失单元格的关键在于方法 getRowCells,其中传入了已知数量的列单元格以及要处理的当前行。

                      int columnCount = worksheetPart.Worksheet.Descendants<Row>().Where(r => r.RowIndex == 1).FirstOrDefault().Descendants<Cell>().Count();
                      
                      IEnumerable<Row> rows = worksheetPart.Worksheet.Descendants<Row>().Where(r => r.RowIndex > 1);
                      
                      List<List<string>> docData = new List<List<string>>();
                      
                      foreach (Row row in rows)
                      {
                          List<Cell> cells = getRowCells(columnCount, row);
                      
                          List<string> rowData = new List<string>();
                      
                          foreach (Cell cell in cells)
                          {
                              rowData.Add(getCellValue(workbookPart, cell));
                          }
                      
                          docData.Add(rowData);
                      }
                      

                      方法 getRowCells 的当前限制是只能支持少于 26 列的工作表(行)。基于已知列数的循环用于查找缺失的列(单元格)。如果找到,则将新的 Cell 值插入到 cells 集合中,新 Cell 的默认值是“”而不是“null”。然后返回修改后的 Cell 集合。

                      private static List<Cell> getRowCells(int columnCount, Row row)
                      {
                          const string COLUMN_LETTERS = "ABCDEFGHIJKLMNOPQRSTUVWXYZ";
                      
                          if (columnCount > COLUMN_LETTERS.Length)
                          {
                             throw new ArgumentException(string.Format("Invalid columnCount ({0}).  Cannot be greater than {1}",
                                      columnCount, COLUMN_LETTERS.Length));
                          }
                      
                          List<Cell> cells = row.Descendants<Cell>().ToList();
                      
                          for (int i = 0; i < columnCount; i++)
                          {
                             if (i < cells.Count)
                             {
                                 string cellColumnReference = cells.ElementAt(i).CellReference.ToString();
                                  if (cellColumnReference[0] != COLUMN_LETTERS[i])
                                  {
                                      cells.Insert(i, new Cell() { CellValue = new CellValue("") });             }
                              }
                              else
                              {
                                  cells.Insert(i, new Cell() { CellValue = new CellValue("") });
                              }
                          }
                      
                          return cells;
                      }
                      
                      private static string getCellValue(WorkbookPart workbookPart, Cell cell)
                      {
                          SharedStringTablePart stringTablePart = workbookPart.SharedStringTablePart;
                          string value = (cell.CellValue != null) ? cell.CellValue.InnerXml : string.Empty;
                      
                          if ((cell.DataType != null) && (cell.DataType.Value == CellValues.SharedString))
                          {
                              return stringTablePart.SharedStringTable.ChildElements[Int32.Parse(value)].InnerText;
                          }
                          else
                          {
                              return value;
                          }
                      }      
                      

                      【讨论】:

                        【解决方案14】:

                        使用此代码运行成功:

                                    string filePath = "test.xlsx"//your file path 
                        
                                    //Open the Excel file using ClosedXML.
                                    using (XLWorkbook workBook = new XLWorkbook(filePath))
                                    {
                                        //Read the first Sheet from Excel file.
                                        IXLWorksheet workSheet = workBook.Worksheet(1);
                        
                                        //Create a new DataTable.
                                        DataTable dt = new DataTable();
                        
                                        //Loop through the Worksheet rows.
                                        bool firstRow = true;
                                        foreach (IXLRow row in workSheet.Rows())
                                        {
                                            //Use the first row to add columns to DataTable.
                                            if (firstRow)
                                            {
                                                foreach (IXLCell cell in row.Cells())
                                                {
                                                    dt.Columns.Add(cell.Value.ToString());
                                                }
                                                firstRow = false;
                                            }
                                            else
                                            {
                        
                                                //Add rows to DataTable.
                                                dt.Rows.Add();
                                                int i = 0;
                                                //for (IXLCell cell in row.Cells())
                                                for (int j = 1; j <= dt.Columns.Count; j++)
                                                {
                                                    if (string.IsNullOrEmpty(row.Cell(j).Value.ToString()))
                                                        dt.Rows[dt.Rows.Count - 1][i] = "";
                                                    else
                                                        dt.Rows[dt.Rows.Count - 1][i] = 
                                                    row.Cell(j).Value.ToString();
                                                    i++;
                                                }
                                            }
                                        }
                                    }
                        

                        【讨论】:

                          猜你喜欢
                          • 2018-11-20
                          • 2021-02-17
                          • 2015-12-27
                          • 1970-01-01
                          • 1970-01-01
                          • 1970-01-01
                          • 1970-01-01
                          • 1970-01-01
                          • 1970-01-01
                          相关资源
                          最近更新 更多