逐行读取word文档答案

【问题标题】：Read from word document line by line逐行读取word文档
【发布时间】：2013-09-01 03:55:29
【问题描述】：

我正在尝试使用 C# 阅读 word 文档。我能够获取所有文本，但我希望能够逐行阅读并存储在列表中并绑定到gridview。目前，我的代码仅返回一个包含所有文本的项目列表（而不是根据需要逐行）。我正在使用 Microsoft.Office.Interop.Word 库来读取文件。以下是我到目前为止的代码：

    Application word = new Application();
    Document doc = new Document();

    object fileName = path;
    // Define an object to pass to the API for missing parameters
    object missing = System.Type.Missing;
    doc = word.Documents.Open(ref fileName,
            ref missing, ref missing, ref missing, ref missing,
            ref missing, ref missing, ref missing, ref missing,
            ref missing, ref missing, ref missing, ref missing,
            ref missing, ref missing, ref missing);

    String read = string.Empty;
    List<string> data = new List<string>();
    foreach (Range tmpRange in doc.StoryRanges)
    {
        //read += tmpRange.Text + "<br>";
        data.Add(tmpRange.Text);
    }
    ((_Document)doc).Close();
    ((_Application)word).Quit();

    GridView1.DataSource = data;
    GridView1.DataBind();

【问题讨论】：

这就是上面列出的所有代码。我将在本周末开始一个项目，该项目将读取一个 word 文件，然后取出双引号之间的所有代码并插入一个变量“A”，他说。然后我必须将逗号后面的部分替换为“A，”B。对于想要对其代码进行一些统计的作者。我会把我的代码放出来让大家看看。有什么特殊的进口需要做的吗？
我会使用像 DocX docx.codeplex.com 这样的轻量级库。
@Hamdi 谢谢我不知道。我已经尝试过了，与 Interop 相比，它使用起来确实很简单。再次感谢。
使用来自 ASP.NET 的 Office Interop 或其他服务器技术是一个可怕的想法。这些 API 是为在桌面应用程序中使用而编写的，用于自动化 Office（一套桌面应用程序）。服务器应用程序在很多方面都不同，这使得在其中使用 Office Interop 成为一个非常非常糟糕的主意。它也不受 Microsoft 支持，并且可能违反您的 Office 许可证。见Considerations for server-side Automation of Office

标签： c# asp.net .net ms-word office-interop

【解决方案1】：

好的。我找到了解决方案here。

最终代码如下：

Application word = new Application();
Document doc = new Document();

object fileName = path;
// Define an object to pass to the API for missing parameters
object missing = System.Type.Missing;
doc = word.Documents.Open(ref fileName,
        ref missing, ref missing, ref missing, ref missing,
        ref missing, ref missing, ref missing, ref missing,
        ref missing, ref missing, ref missing, ref missing,
        ref missing, ref missing, ref missing);

String read = string.Empty;
List<string> data = new List<string>();
for (int i = 0; i < doc.Paragraphs.Count; i++)
{
    string temp = doc.Paragraphs[i + 1].Range.Text.Trim();
    if (temp != string.Empty)
        data.Add(temp);
}
((_Document)doc).Close();
((_Application)word).Quit();

GridView1.DataSource = data;
GridView1.DataBind();

【讨论】：

在我的代码@open 方法中显示路径无效并且某些'COMException 未处理'
使用来自 ASP.NET 的 Office Interop 或其他服务器技术是一个可怕的想法。这些 API 是为在桌面应用程序中使用而编写的，用于自动化 Office（一套桌面应用程序）。服务器应用程序在很多方面都不同，这使得在其中使用 Office Interop 成为一个非常非常糟糕的主意。它也不受 Microsoft 支持，并且可能违反您的 Office 许可证。见Considerations for server-side Automation of Office

【解决方案2】：

上面的代码是正确的，但是速度太慢了。我改进了代码，比上面的要快很多。

List<string> data = new List<string>();
Application app = new Application();
Document doc = app.Documents.Open(ref readFromPath);

foreach (Paragraph objParagraph in doc.Paragraphs)
    data.Add(objParagraph.Range.Text.Trim());

((_Document)doc).Close();
((_Application)app).Quit();

【讨论】：

【解决方案3】：

这个怎么样哟。从文档中获取所有单词并在返回或任何对您更好的情况下拆分它们。然后变成列表

   List<string> lines = doc.Content.Text.Split('\n').ToList();

【讨论】：

它的 \r\a，但 \r 会做，而不是 \n