【发布时间】:2014-04-18 23:42:06
【问题描述】:
我正在尝试根据 doctype 标记中引用的外部 DTD 验证 XML 文件。具体来说:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE en-export SYSTEM "http://xml.evernote.com/pub/evernote-export3.dtd">
...the rest of the document...
我正在使用 Python 3.3 和 lxml 模块。从阅读 http://lxml.de/validation.html#validation-at-parse-time 开始,我把它放在一起:
enexFile = open(sys.argv[2], mode="rb") # sys.argv[2] is the path to an XML file in local storage.
enexParser = etree.XMLParser(dtd_validation=True)
enexTree = etree.parse(enexFile, enexParser)
根据我对validation.html 的理解,lxml 库现在应该负责检索DTD 并执行验证。但相反,我得到了这个:
$ ./mapwrangler.py validate notes.enex
Traceback (most recent call last):
File "./mapwrangler.py", line 27, in <module>
enexTree = etree.parse(enexFile, enexParser)
File "lxml.etree.pyx", line 3239, in lxml.etree.parse (src/lxml/lxml.etree.c:69955)
File "parser.pxi", line 1769, in lxml.etree._parseDocument (src/lxml/lxml.etree.c:102257)
File "parser.pxi", line 1789, in lxml.etree._parseFilelikeDocument (src/lxml/lxml.etree.c:102516)
File "parser.pxi", line 1684, in lxml.etree._parseDocFromFilelike (src/lxml/lxml.etree.c:101442)
File "parser.pxi", line 1134, in lxml.etree._BaseParser._parseDocFromFilelike (src/lxml/lxml.etree.c:97069)
File "parser.pxi", line 582, in lxml.etree._ParserContext._handleParseResultDoc (src/lxml/lxml.etree.c:91275)
File "parser.pxi", line 683, in lxml.etree._handleParseResult (src/lxml/lxml.etree.c:92461)
File "parser.pxi", line 622, in lxml.etree._raiseParseError (src/lxml/lxml.etree.c:91757)
lxml.etree.XMLSyntaxError: Validation failed: no DTD found !, line 3, column 43
这让我很吃惊,因为如果我关闭验证,那么文档解析得很好,我可以通过print(enexTree.docinfo.doctype) 来获取
$ ./mapwrangler.py validate notes.enex
<!DOCTYPE en-export SYSTEM "http://xml.evernote.com/pub/evernote-export3.dtd">
所以在我看来,找到 DTD 应该没有任何问题。
感谢您的帮助。
【问题讨论】:
-
在构造解析器对象的时候加上
no_network=False会怎样?请参阅此处的“解析器选项”:lxml.de/parsing.html#parsers。 -
啊,非常感谢@mjzn!就是这样。你能把它写在答案中,这样我就可以奖励你的声誉了吗?
标签: python xml validation lxml dtd