【发布时间】:2013-12-11 19:23:07
【问题描述】:
我正在尝试使用 lxml 从锚链接中查找所有图像(.png、.bmp、.jpg)和可执行文件 (.exe)。从这个similar thread,接受的答案建议做这样的事情:
png = tree.xpath("//div/ul/li//a[ends-with(@href, '.png')]")
bmp = tree.xpath("//div/ul/li//a[ends-with(@href, '.bmp')]")
jpg = tree.xpath("//div/ul/li//a[ends-with(@href, '.jpg')]")
exe = tree.xpath("//div/ul/li//a[ends-with(@href, '.exe')]")
但是,我不断收到此错误:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "lxml.etree.pyx", line 2095, in lxml.etree._ElementTree.xpath (src/lxml/lxml.etree.c:53597)
File "xpath.pxi", line 373, in lxml.etree.XPathDocumentEvaluator.__call__ (src/lxml/lxml.etree.c:134052)
File "xpath.pxi", line 241, in lxml.etree._XPathEvaluatorBase._handle_result (src/lxml/lxml.etree.c:132625)
File "xpath.pxi", line 226, in lxml.etree._XPathEvaluatorBase._raise_eval_error (src/lxml/lxml.etree.c:132453)
lxml.etree.XPathEvalError: Unregistered function
我正在通过 pip 运行 lxml 3.2.4。
另外,不是为每个文件扩展名定义 4 次 xpath,有没有办法使用 xpath 并一次指定所有四个文件扩展名?
【问题讨论】:
标签: python html lxml web-crawler