【发布时间】:2014-02-21 19:19:15
【问题描述】:
我正在尝试做这样的事情:
var document = htmlWeb.Load(searchUrl);
var hotels = document.DocumentNode.Descendants("div")
.Where(x => x.Attributes.Contains("class") &&
x.Attributes["class"].Value.Contains("listing-content"));
int count = 1;
foreach (var hotel in hotels)
{
HtmlDocument htmlDoc = new HtmlDocument();
htmlDoc.OptionFixNestedTags = true;
htmlDoc.Load(hotel.InnerText);
if (htmlDoc.DocumentNode != null)
{
var anchors = htmlDoc.DocumentNode.Descendants("div")
.Where(x => x.Attributes.Contains("class") &&
x.Attributes["class"].Value.Contains("srp-business-name")); // Error Occurring in here //
foreach (var anchor in anchors)
{
Console.WriteLine(anchor.InnerHtml);
}
}
}
我得到这样的结果:
<a href="http://ad.doubleclick.net/clk;234504055;58257942;j?http://www.marriott.com/NYCMQ" class="url mip-link" data-analytics="{"click_id":1601,"rank":1,"act":1,"FL":"list","target":"name","supermedia":true}" rel="nofollow">New York Marriott Marquis</a>
<a href="http://www.yellowpages.com/new-york-ny/mip/new-york-marriott-marquis-468349733?lid=1000372156461" class="no-tracks hidden url" data-analytics="{"click_id":1601,"rank":1,"act":1,"FL":"list","target":"name","supermedia":true}" rel="nofollow"></a>
<span class="external-link">
<img height="15" src="/images/sprites/search/icon-link-external.png" width="16">
</span>
和
<a href="http://www.yellowpages.com/new-york-ny/mip/courtyard-by-marriott-new-york-manhattan-times-square-south-2198956?lid=178101818" class="url redbold mip-link" data-analytics="{"click_id":1600,"rank":2,"act":1,"FL":"list","target":"name","supermedia":""}">Courtyard by Marriott New York Manhattan/Times Square South</a>
等等。
现在我想要具有class="url redbold mip-link" 的锚标记的innerHtml。所以我正在这样做:
var document = htmlWeb.Load(searchUrl);
var hotels = document.DocumentNode.Descendants("div")
.Where(x => x.Attributes.Contains("class") &&
x.Attributes["class"].Value.Contains("listing-content"));
int count = 1;
foreach (var hotel in hotels)
{
HtmlDocument htmlDoc = new HtmlDocument();
htmlDoc.OptionFixNestedTags = true;
htmlDoc.Load(hotel.InnerText);
if (htmlDoc.DocumentNode != null)
{
var anchors = htmlDoc.DocumentNode.Descendants("div")
.Where(x => x.Attributes.Contains("class") &&
x.Attributes["class"].Value.Contains("srp-business-name"));
foreach (var anchor in anchors)
{
htmlDoc.LoadHtml(anchor.InnerHtml);
var hoteltags = htmlDoc.DocumentNode.SelectNodes("//a");
foreach (var tag in hoteltags)
{
if (!string.IsNullOrEmpty(tag.InnerHtml) || !string.IsNullOrWhiteSpace(tag.InnerHtml))
{
Console.WriteLine(tag.InnerHtml);
}
}
}
}
}
我正确地获得了第一个结果,即New York Marriott Marquis,但在第二个结果中发生了错误:
startIndex cannot be larger than length of string。我做错了什么??
【问题讨论】:
-
异常发生在哪一行?
-
我坚信这段代码不会产生你提到的异常。
-
Keith Payne 是的,我遇到了这个错误。并且我已经更新了我的问题,我在其中提到了发生错误的评论。
-
Sudhakar Tillapudi :我已经更新了我的问题,我在评论中提到了发生错误的地方。
标签: c# html-parsing html-agility-pack