【问题标题】:Not able to parse html using lxml Xpath parser无法使用 lxml Xpath 解析器解析 html
【发布时间】:2012-07-12 14:53:28
【问题描述】:

我正在尝试解析来自此页面的评论:http://www.amazon.co.uk/product-reviews/B00143ZBHY

使用以下方法:

代码

html # a variable which contains exact html as given at the above page.
from lxml import etree
tree = etree.HTML(html)
r = tree.xpath(".//*[@id='productReviews']/tbody/tr/td[1]/div[9]/text()[4]")
print len(r)
print r[0].tag

输出

0
Traceback (most recent call last):
  File "c.py", line 37, in <module>
    print r[0].tag
IndexError: list index out of range

p,s,: 在 firefox 的 xpath 检查器插件上使用相同的 xpath 时,我可以轻松地做到这一点。但是这里没有结果,请帮忙!

【问题讨论】:

  • 不知道为什么 chrome 在 xpath 中显示 tbody :(

标签: python xpath lxml


【解决方案1】:

尝试从 XPath 中删除 /tbody#productReviews 中没有 &lt;tbody&gt;

import urllib2
html = urllib2.urlopen("http://www.amazon.co.uk/product-reviews/B00143ZBHY").read()
from lxml import etree
tree = etree.HTML(html)
r = tree.xpath(".//*[@id='productReviews']/tr/td[1]/div[9]/text()[4]")
print r[0]

输出:

bought this as replacement for the original cover which came with my greenhouse and which ripped in the wind.  so far this seems a good replacement although for some reason it seems slightly too small for my greenhouse so that i cant zip both sides of the front at the same time.  seems sturdier and thicker than the cover i had before so hoping it lasts a bit longer!

【讨论】:

  • 我只能在发布问题 15 分钟后接受答案,等待我会在 3 分钟内完成
  • @dragosrsupercool 这不是一个愚蠢的错误,请在此处阅读:stackoverflow.com/a/5586627/1167879
猜你喜欢
  • 1970-01-01
  • 2012-08-23
  • 2016-02-20
  • 1970-01-01
  • 2012-07-29
  • 2018-09-12
  • 2010-12-07
  • 1970-01-01
  • 2011-04-03
相关资源
最近更新 更多