【问题标题】:IOError with lxml etree parse function带有 lxml etree 解析函数的 IOError
【发布时间】:2011-06-14 09:37:07
【问题描述】:

我有这样的逻辑:

for root, dirs, files in os.walk(os.getcwd()):
    if "info.xml" in files:
        root = lxml.etree.parse("%s/info.xml" % root)
        tag = root.xpath("/info/tagname")[0].text

当解析一个在当前路径很深的info.xml时,遇到错误消息:

    Traceback (most recent call last):
  File "/home/work/mergefile.py", line 365, in <module>
  File "/home/work/mergefile.py", line 344, in merge_ejb_files
  File "/home/work/mergefile.py", line 63, in __init__
  File "/home/work/mergefile.py", line 78, in _parse_info2doc
  File "lxml.etree.pyx", line 2698, in lxml.etree.parse (src/lxml/lxml.etree.c:49590)
  File "parser.pxi", line 1491, in lxml.etree._parseDocument (src/lxml/lxml.etree.c:71205)
  File "parser.pxi", line 1520, in lxml.etree._parseDocumentFromURL (src/lxml/lxml.etree.c:71488)
  File "parser.pxi", line 1420, in lxml.etree._parseDocFromFile (src/lxml/lxml.etree.c:70583)
  File "parser.pxi", line 975, in lxml.etree._BaseParser._parseDocFromFile (src/lxml/lxml.etree.c:67736)
  File "parser.pxi", line 539, in lxml.etree._ParserContext._handleParseResultDoc (src/lxml/lxml.etree.c:63820)
  File "parser.pxi", line 625, in lxml.etree._handleParseResult (src/lxml/lxml.etree.c:64741)
  File "parser.pxi", line 563, in lxml.etree._raiseParseError (src/lxml/lxml.etree.c:64056)
IOError: Error reading file '/home/work/ci/case/dc_daily/dc/213577/223922/223958/792536/info.xml': failed to load external entity "/home/work/ci/case/dc_daily/dc/213577/223922/223958/792536/info.xml"

但是文件"/home/work/ci/case/dc_daily/dc/213577/223922/223958/792536/info.xml"存在,我可以在ipython IDE下用lxml解析它

你知道问题是什么吗?如果你知道,请帮助我! 谢谢!

【问题讨论】:

  • IIRC 此错误告诉您 lxml 无法加载文件中指定的外部实体。这可能是 DOCTYPE、模式、外部实体规范(& 和其他东西)。可以在不验证模式一致性的情况下加载文档,这反过来将跳过外部实体加载。 parse 函数应该有一些参数。抱歉,我现在有点忙,所以你得自己去找:)
  • 谢谢你的命令,今天调试这个问题时,我先打开xml文件,然后以文件描述符作为参数调用lxml.etree.parse,它会引发IOError:打开文件时打开文件过多/home/work/ci/case/dc_daily/dc/213577/223922/223958/792536/info.xml,所以这实际上不是lxml的问题,因为linux已经设置只能在一个进程中打开1024个文件,我正在尝试用子进程来做
  • 如何打开文件,解析它,然后关闭它?这样您一次只能打开 1 个文件。

标签: python parsing lxml ioerror


【解决方案1】:

根据我上面的评论,这是我的解决方案。我正在打开文件进行读取,然后它们会立即关闭它们,这样我就不会达到 1024 个文件的限制。

import lxml.etree as etree
for root,dirs,files in os.walk(os.getcwd()):
    if "info.xml" in files:
        with open('%s/info.xml'%root) as processfile: #use 'rb' if necessary
            xml = etree.parse(processfile)
            tag = root.xpath("/info/tagname")[0].text

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 2020-08-28
    • 1970-01-01
    • 2017-07-20
    • 1970-01-01
    • 2021-04-29
    • 2011-03-08
    • 1970-01-01
    相关资源
    最近更新 更多