【问题标题】:XPath retrieving two queriesXPath 检索两个查询
【发布时间】:2011-11-01 03:04:18
【问题描述】:

我想从此网页中检索“课程负责人”的姓名和电子邮件地址:

http://www.westminster.ac.uk/schools/computing/undergraduate/computer-games-development/bsc-honours-computer-games-development

如何做到这一点?

我尝试在“课程内容”之后检索第一个 <p>,但效果不佳..

"//div[starts-with(@id,'content_div')]/h3[.='Course Content']/following-sibling::p[1]

【问题讨论】:

  • h4 检查什么?什么种类'不太有效'?
  • 正在测试一些东西。目前,它获取课程负责人姓名,但不获取电子邮件地址。它如何检索两者?

标签: html objective-c xcode xpath html-parsing


【解决方案1】:

由于您要查找的任何一个值都没有真正唯一的识别标签,我会跳过 xpath 并创建一个肮脏的小技巧。

// get the HTML code.
NSString * getURL = [NSString stringWithFormat:@"http://www.westminster.ac.uk/schools/computing/undergraduate/computer-games-development/bsc-honours-computer-games-development"];
NSData * htmlData = [[NSData alloc] initWithContentsOfURL:[NSURL URLWithString:getURL]];
NSString *htmlString = [[NSString alloc] initWithData:htmlData encoding:NSUTF8StringEncoding];


//seperate the HTML code by the unique HTML line of "<h3>Course Leader</h3>"
NSArray *tempArray = [htmlString componentsSeparatedByString:@"<h3>Course Leader</h3>"];
NSString * tempString1 = [[tempArray objectAtIndex:1]description];

//get Name
NSArray * tempArray2 = [tempString1 componentsSeparatedByString:@"<br />"];

//set name
NSString * nameString = [[tempArray2 objectAtIndex:0]description];
//clean up name string
nameString = [nameString stringByReplacingOccurrencesOfString:@"\n" withString:@""];
nameString = [nameString stringByReplacingOccurrencesOfString:@"\r" withString:@""];
nameString = [nameString stringByReplacingOccurrencesOfString:@"<p>" withString:@""];

//get Email 
NSArray * emailArray = [tempString1 componentsSeparatedByString:@">"];

//set email string
NSString * emailString = [[emailArray objectAtIndex:3]description];
//clean up email string
emailString = [emailString stringByReplacingOccurrencesOfString:@"</a" withString:@""];

NSLog(@"Results: Name = %@  Email = %@",nameString,emailString);

【讨论】:

    【解决方案2】:

    我不确切知道 xml / xpath 代码在 Objective C 中的样子,但我怀疑您正在已经获得了所需的所有信息,您只需要多做一点把它拆开。您的 xpath 检索的节点如下所示(我已经编辑了内容):

    <p>Anastassia Angelopolou<br />
    Email: <a href="mailto:agelopa@wmin.ac.uk.invalid">agelopa@wmin.ac.uk.invalid</a></p>
    

    因此,如果您只要求 p 节点的 文本,您只会得到文本 Anastassia Angelopolou,即(第一个)内部文本,直到第一个子节点(@987654324 @)。要获取电子邮件地址,您可以从 p 节点到 ./a 子节点的 xpath 并获取 @href 的文本或值。

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 2012-10-18
      • 1970-01-01
      • 2013-08-19
      • 1970-01-01
      • 2011-07-24
      • 2012-07-21
      • 2012-03-27
      相关资源
      最近更新 更多