来自维基百科的解析表答案

【问题标题】：Parse table from Wikipedia来自维基百科的解析表
【发布时间】：2026-01-05 17:35:02
【问题描述】：

我需要在这个网页上阅读整个表格https://it.wikipedia.org/wiki/Episodi_di_Watchmen

我不关心标题，但我当然需要阅读每一行和每一列。我写了这段代码：

string page = "https://it.wikipedia.org/wiki/Episodi_di_Watchmen";
HtmlDocument doc = new HtmlDocument();
StreamReader reader = new StreamReader(WebRequest.Create(page).GetResponse().GetResponseStream(), Encoding.UTF8);

doc.Load(reader);
List<List<string>> table = doc.DocumentNode.SelectSingleNode("//table[@class='wikitable']")
                        .Descendants("tr").Select(x => x.ChildNodes.Select(c => c.InnerText.Trim())
                        .Where(y => !string.IsNullOrWhiteSpace(y)).ToList()).ToList();

不幸的是，上面的代码在列表行上给出了一个错误。做了一些试验，似乎错误出在 Descendants 方法中。

你能帮帮我吗？

【问题讨论】：

您应该使用Wikimedia API 而不是试图抓取*页面。
我开始认为我必须使用 Wikimedia API，因为我的代码直到几天前才有效。我需要研究它，我从来没有用过这样的东西，

标签： c# html .net html-agility-pack

【解决方案1】：

HtmlDocument.Load(TextReader reader) 未能创建DocumentNode：它是null。可能是因为 TextReader 没有 public 可分配的编码字段。这就是你得到例外的原因。这是工作代码：

class WikiTable
{
    public async Task<IEnumerable<List<string>>> LoadWikiTable(string requestUriString)
    {
        HtmlDocument doc = new HtmlDocument();
        using (StreamReader reader = new StreamReader(WebRequest
                                                        .Create(requestUriString)
                                                        .GetResponse()
                                                        .GetResponseStream(),
                                                      Encoding.UTF8))
        {
            await Task.Run(() => 
                           doc.Load(reader.BaseStream /*, Encoding.UTF8*/));
        }

        return doc.DocumentNode
                  .SelectSingleNode("//table[@class='wikitable']")
                  .Descendants("tr")
                  .Select(x => x.ChildNodes
                              .Select(c => c.InnerText.Trim())
                              .Where(y => !string.IsNullOrWhiteSpace(y))
                              .ToList()
                          );
    }

    public static string Table2String(IEnumerable<List<string>> table)
    {
        string Row2String(List<string> row) => string.Join("\t", row);

        return string.Join("\n", 
                           table.Select(row => Row2String(row)));
    }
}

用法：

var table = new WikiTable().LoadWikiTable2("https://it.wikipedia.org/wiki/Episodi_di_Watchmen");
Console.WriteLine(WikiTable.Table2String(table.Result));

你怎么看？

【讨论】：

它不起作用。 table.Result 为空。我认为*改变了一些东西，因为直到几天前还没有问题。