【发布时间】:2015-01-12 17:46:56
【问题描述】:
我要提取 unicode 格式
<div class="" id="messageContent">\xd8\xaf\xd8\xb1</div>
我尝试的是:
import requests
from lxml import html
post_data=...
post_response=requests.post(url='http://example.com/', data=post_data)
out=post_response.text
tree=html.fromstring(out)
print out.xpath('//div/[@id="messageContent"]/text()')
更新
然后我得到的错误:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "lxml.etree.pyx", line 1447, in lxml.etree._Element.xpath (src/lxml/lxml.etree.c:41728)
File "xpath.pxi", line 321, in lxml.etree.XPathElementEvaluator.__call__ (src/lxml/lxml.etree.c:117734)
File "xpath.pxi", line 239, in lxml.etree._XPathEvaluatorBase._handle_result (src/lxml/lxml.etree.c:116911)
File "xpath.pxi", line 225, in lxml.etree._XPathEvaluatorBase._raise_eval_error (src/lxml/lxml.etree.c:116780)
lxml.etree.XPathEvalError: Invalid expression
我想要messageContent的输出:
\xd8\xaf\xd8\xb1
【问题讨论】:
-
out是文本,tree是 ElementTree 对象。对我来说似乎是一个简单的错字。