【发布时间】:2014-11-30 16:58:10
【问题描述】:
我正在尝试解析 XHTML 文件并获取属性及其值。使用 libxml。
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <libxml/HTMLparser.h>
#include <libxml/xmlmemory.h>
#include <libxml/tree.h>
#include <libxml/parser.h>
void walkTree(xmlNode * a_node)
{
xmlNode *cur_node = NULL;
xmlAttr *cur_attr = NULL;
for (cur_node = a_node; cur_node; cur_node = cur_node->next) {
// do something with that node information, like… printing the tag’s name and attributes
printf("Got tag : %s\n", cur_node->name);
for (cur_attr = cur_node->properties; cur_attr; cur_attr = cur_attr->next) {
printf(" -> with attribute : %s\n", cur_attr->name);
printf(" -> with Value: %s\n", (cur_attr->children)->name);
}
walkTree(cur_node->children);
}
}
int main(void)
{
// Load XHTML
char *data;
data = "<html><body class=\"123\" damn=\"123\"></html>";
int len = strlen(data) + 1;
htmlParserCtxtPtr parser = htmlCreatePushParserCtxt(NULL, NULL, NULL, 0, NULL, 0);
htmlCtxtUseOptions(parser, HTML_PARSE_NOBLANKS | HTML_PARSE_NOERROR | HTML_PARSE_NOWARNING | HTML_PARSE_NONET);
htmlParseChunk(parser, data, len, 0);
htmlParseChunk(parser, NULL, len, 1);
walkTree(xmlDocGetRootElement(parser->myDoc));
}
我期待这个输出
Got tag: html
Got tag: body
-> with attribute: class
-> with value: 123
-> with attribute: damn
-> with value: 123
但不幸的是,我得到了这个输出:
Got tag: html
Got tag: body
-> with attribute: class
-> with value: text
-> with attribute: damn
-> with value: text
我也尝试过其他html代码,无论属性值是什么,它总是显示“文本”而不是值。
为什么?如何解决?如何获取真实的属性值?
【问题讨论】: