HtmlAgilityPack - 获取 DIV 内容答案

【问题标题】：HtmlAgilityPack - Get DIV contentHtmlAgilityPack - 获取 DIV 内容
【发布时间】：2017-10-20 01:15:11
【问题描述】：

我正在尝试在 WinForms C# 中使用 HtmlAgilityPack 从 DIV 中获取一些文本。

我的代码是：

var doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml("http://www.tibia.com/news/?subtopic=latestnews");
var res = doc.DocumentNode.SelectSingleNode("//div[@id='PlayersOnline']");
var content = res.InnerHtml;

// Print content
MessageBox.Show(content);

我试图获取的内容来自： http://www.tibia.com/news/?subtopic=latestnews

在网站的右上角有一个框，上面写着“在线玩家”的数量。我想得到那个数额。

网站上的 HTML 如下所示：

<div id="PlayersOnline" onclick="window.location = 'https://secure.tibia.com/community/?subtopic=worlds';">11723<br>Players Online</div>

所以我想得到11723 作为输出。如果我得到整个：11723<br>Players Online 作为输出也没关系。我可以稍后进行正则表达式匹配或拆分字符串或其他内容，以忽略 br 标记。

但是我的代码都没有工作，我不知道为什么。应用程序崩溃并说

System.NullReferenceException: 'Object reference not set to an instance of an object.'

<res>5__8 was null.

【问题讨论】：

标签： c# html string html-agility-pack scrape

【解决方案1】：

更改此行：

    HtmlAgilityPack.Web webSite = new HtmlAgilityPack.Web();
    HtmlAgilityPack.HtmlDocument document = webSite.Load("http://www.tibia.com/news/?subtopic=latestnews");

    string content = document.GetElementbyId("PlayersOnline").OuterHtml;

【讨论】：

给我这个错误：System.ArgumentException: 'URI formats are not supported.'
对不起.. HtmlWeb website = new HtmlWeb(); HtmlDocument 文档 = website.Load("yourUrl");
应用程序只是崩溃并说System.NullReferenceException: 'Object reference not set to an instance of an object.' HtmlAgilityPack.HtmlDocument.GetElementbyId(...) returned null.

【解决方案2】：

尝试InnerText 而不是InnerHtml

var content = doc.DocumentNode.SelectSingleNode("//div[@id='PlayersOnline']").InnerText;

【讨论】：

你可以试试在//div前面加一个点吗？
类似 doc.DocumentNode.SelectSingleNode(".//div[@id='PlayersOnline']").InnerText;