【问题标题】:Parsing XML Entity with python xml.sax使用 python xml.sax 解析 XML 实体
【发布时间】:2011-06-14 20:11:42
【问题描述】:

使用 xml.sax 使用 python 解析 XML,但我的代码无法捕获实体。为什么 skippedEntity() 或 resolveEntity() 不报告如下:

import os
import cStringIO
import xml.sax
from xml.sax.handler import ContentHandler,EntityResolver,DTDHandler

#Class to parse and run test XML files
class TestHandler(ContentHandler,EntityResolver,DTDHandler):

    #SAX handler - Entity resolver
    def resolveEntity(self,publicID,systemID):
        print "TestHandler.resolveEntity: %s  %s" % (publicID,systemID)

    def skippedEntity(self, name):
        print "TestHandler.skippedEntity: %s" % (name)

    def unparsedEntityDecl(self,publicID,systemID,ndata):
        print "TestHandler.unparsedEntityDecl: %s  %s" % (publicID,systemID)

    def startElement(self,name,attrs):
        # name = string.lower(name)
        summary = '' + attrs.get('summary','')
        arg = '' + attrs.get('arg','')
        print 'TestHandler.startElement(), %s : %s (%s)' % (name,summary,arg)


def run(xml_string):
    try:
        parser = xml.sax.make_parser()
        stream = cStringIO.StringIO(xml_string)

        curHandler = TestHandler()
        parser.setContentHandler(curHandler)
        parser.setDTDHandler( curHandler )
        parser.setEntityResolver( curHandler )

        parser.parse(stream)
        stream.close()
    except (xml.sax.SAXParseException), e:
        print "*** PARSER error: %s" % e;

def main():
    try:
        XML = "<!DOCTYPE page[ <!ENTITY num 'foo'> ]><test summary='step: &num;'>Entity: &not;</test>"
        run(XML)
    except Exception, e:
      print 'FATAL ERROR: %s' % (str(e))

if __name__== '__main__':
    main()

运行时,我看到的是:

 TestHandler.startElement(), step: foo ()
 *** PARSER error: <unknown>:1:36: undefined entity

为什么我看不到 &num; 的 resolveEntity 打印或跳过的条目打印 ¬?

【问题讨论】:

    标签: python xml parsing sax


    【解决方案1】:

    我认为 resolveEntity 和 skippedEntity 只对外部 DTD 调用。我通过修改 XML 来实现这一点。

    XML = """<?xml version="1.0" encoding="utf-8" ?>
    <!DOCTYPE test SYSTEM "external.dtd" >
    <test summary='step: &foo; &bar;'>Entity: &not;</test>
    """
    

    external.dtd 包含两个简单的实体声明。

    <!ENTITY foo "bar">
    <!ENTITY bar "foo">
    

    另外,我摆脱了 resolveEntity。

    这个输出 -

    TestHandler.startElement(), test : step: bar foo ()
    TestHandler.skippedEntity: not
    

    希望这会有所帮助。

    【讨论】:

    • 谢谢,我没明白 DTD 必须是外部的。
    【解决方案2】:

    这是您的程序的修改版本,我希望它有意义。它演示了调用所有TestHandler 方法的情况。

    import StringIO
    import xml.sax
    from xml.sax.handler import ContentHandler
    
    # Inheriting from EntityResolver and DTDHandler is not necessary
    class TestHandler(ContentHandler):
    
        # This method is only called for external entities. Must return a value. 
        def resolveEntity(self, publicID, systemID):
            print "TestHandler.resolveEntity(): %s %s" % (publicID, systemID)
            return systemID
    
        def skippedEntity(self, name):
            print "TestHandler.skippedEntity(): %s" % (name)
    
        def unparsedEntityDecl(self, name, publicID, systemID, ndata):
            print "TestHandler.unparsedEntityDecl(): %s %s" % (publicID, systemID)
    
        def startElement(self, name, attrs):
            summary = attrs.get('summary', '')
            print 'TestHandler.startElement():', summary
    
    def main(xml_string):
        try:
            parser = xml.sax.make_parser()
            curHandler = TestHandler()
            parser.setContentHandler(curHandler)
            parser.setEntityResolver(curHandler)
            parser.setDTDHandler(curHandler)
    
            stream = StringIO.StringIO(xml_string)
            parser.parse(stream)
            stream.close()
        except xml.sax.SAXParseException, e:
            print "*** PARSER error: %s" % e
    
    XML = """<!DOCTYPE test SYSTEM "test.dtd">
    <test summary='step: &num;'>Entity: &not;</test>
    """
    
    main(XML)
    

    test.dtd 包含:

    <!ENTITY num "FOO">
    <!ENTITY pic SYSTEM 'bar.gif' NDATA gif>
    

    输出:

    TestHandler.resolveEntity(): None test.dtd
    TestHandler.unparsedEntityDecl(): None bar.gif
    TestHandler.startElement(): step: FOO
    TestHandler.skippedEntity(): not
    

    加法

    据我所知,skippedEntity 仅在使用外部 DTD 时才被调用(至少我想不出反例;如果the documentation 更清晰一点就好了)。

    Adam 在他的回答中说,resolveEntity 仅被外部 DTD 调用。但这并不完全正确。在处理对在内部或外部 DTD 子集中声明的外部实体的引用时,也会调用 resolveEntity。例如:

    <!DOCTYPE test [
    <!ENTITY num SYSTEM "bar.txt">
    ]>
    

    bar.txt 的内容可能是FOO。在这种情况下it is not possible to refer to the entity in an attribute value

    【讨论】:

    • 谢谢。如果没有外部 DTD,是否有办法让 skippedEntity 被调用?
    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2013-01-22
    • 2011-11-06
    • 2011-11-06
    • 1970-01-01
    • 2015-02-21
    • 1970-01-01
    相关资源
    最近更新 更多