【问题标题】:How to read CDATA from xml file with Python如何使用 Python 从 xml 文件中读取 CDATA
【发布时间】:2014-09-30 12:48:43
【问题描述】:

我尝试用 Python 解析一个大的 xml 文件,但是当我想打印 CDATA 信息时,什么都没有,尤其是带有“内容”标签的描述

我的源代码如下所示:

#!/usr/bin/python
# -*- coding: utf-8 -*-  
import xml.sax
import re
from cStringIO import StringIO

class MovieHandler( xml.sax.ContentHandler ):
   def __init__(self):
      self.item = {}
      self.CurrentData = ""
      self.url = ""
      self.description = ""
      self.price = ""



   # Call when an element starts
   def startElement(self, tag, attributes):
      self.CurrentData = tag

   # Call when an elements ends
   def endElement(self, tag):
      elif self.CurrentData == "url":
          self.item["url"] = self.url
      elif self.CurrentData == "content":
    print 'description: ', self.description
      elif self.CurrentData == "price":
    if self.price:
            self.price = re.sub('[^0-9]','',self.price[0].encode('ascii', 'ignore'))
            self.item["price"] = int(self.price)

      self.CurrentData = ""
      print self.item
      self.item.clear()

   # Call when a character is read
   def characters(self, content):
      if self.CurrentData == "url":
         self.url = content
      elif self.CurrentData == "content":
         self.description = content
      elif self.CurrentData == "price":
         self.price = content


if ( __name__ == "__main__"):

   # create an XMLReader
   parser = xml.sax.make_parser()
   # turn off namepsaces
   parser.setFeature(xml.sax.handler.feature_namespaces, 0)

   # override the default ContextHandler
   Handler = MovieHandler()
   parser.setContentHandler(Handler)

   parser.parse("myfile.xml")
   print "done"

内容标签如下所示:

<content><![CDATA[Jaguar XKR 
new tires 
perfect condition 
Black LeatherInterior]]></content>

提前致谢

【问题讨论】:

  • 您的示例程序没有运行:它有语法错误。请将您的程序缩减为显示错误的最短程序,然后将该程序复制粘贴到您的问题中。

标签: python xml cdata


【解决方案1】:

.characters() 函数可以多次调用,每次都带有一段文本。您似乎在每次调用都会覆盖 self.description

试试这个:

def characters(self, content):
    ...
    self.description += content  # Note: '+=', not '='
    ...

完成后记得设置self.description = ""

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 2010-11-19
    • 2014-01-07
    • 2017-03-16
    • 2010-12-16
    • 1970-01-01
    • 2020-09-21
    • 2011-01-17
    • 2011-06-07
    相关资源
    最近更新 更多