【问题标题】:Extracting unicode data from url source by xpath in python [closed]在python中通过xpath从url源中提取unicode数据[关闭]
【发布时间】:2015-01-12 17:46:56
【问题描述】:

我要提取 unicode 格式

<div class="" id="messageContent">\xd8\xaf\xd8\xb1</div>

我尝试的是:

import requests
from lxml import html
post_data=...
post_response=requests.post(url='http://example.com/', data=post_data)
out=post_response.text
tree=html.fromstring(out)
print out.xpath('//div/[@id="messageContent"]/text()')

更新

然后我得到的错误:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "lxml.etree.pyx", line 1447, in lxml.etree._Element.xpath (src/lxml/lxml.etree.c:41728)
  File "xpath.pxi", line 321, in lxml.etree.XPathElementEvaluator.__call__ (src/lxml/lxml.etree.c:117734)
  File "xpath.pxi", line 239, in lxml.etree._XPathEvaluatorBase._handle_result (src/lxml/lxml.etree.c:116911)
  File "xpath.pxi", line 225, in lxml.etree._XPathEvaluatorBase._raise_eval_error (src/lxml/lxml.etree.c:116780)
lxml.etree.XPathEvalError: Invalid expression

我想要messageContent的输出:

\xd8\xaf\xd8\xb1

【问题讨论】:

  • out 是文本,tree 是 ElementTree 对象。对我来说似乎是一个简单的错字。

标签: python xpath unicode lxml


【解决方案1】:

错误很明显:变量out 存储一个unicode 对象,而不是具有xpath 属性的对象。你可能只是混淆了outtree

print out # will give you the whole text
print tree.xpath(...)  # will probably print what you were looking for

它与您尝试提取的“unicode 数据”没有任何关系。

【讨论】:

  • 是的,谢谢...我很困惑...想象我说:print tree.xpath(...) ..错误是lxml.etree.XPathEvalError: Invalid expression
  • 有时我们忽略了最明显的事情:)。如果有帮助,请考虑点赞并标记为答案。
  • +1,请查看我的更新
  • 收不到\xd8\xaf\xd8\xb1
  • 这实际上是另一个问题。问题是out 中的文本无法解析为 xml。打印out 并尝试找出原因。
【解决方案2】:

你可能是想说tree.xpath(...)

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2012-11-15
    • 2018-01-16
    • 1970-01-01
    • 2022-11-10
    • 2016-08-19
    • 1970-01-01
    相关资源
    最近更新 更多