【发布时间】:2015-07-01 06:36:30
【问题描述】:
所以,我正在为一个网站的评论部分构建一个网络爬虫,但我遇到了一个问题,我似乎找不到 cmets 内容的文本节点。这是网页元素的外观:
<div class="comments"> // this is the whole comments section
<div class="comment"> // this is where the p is located
<div class="comment-top">
<div class="comment-nr">208. PROTAS</div>
<div class="comment-info">
<div class="comment-time">2015-06-30 13:00</div>
<div class="comment-ip">IP: 178.250.32.165</div>
<div class="comment-vert1">
<a href="javascript:comr(24470645,'p')">
<img src="http://img.lrytas.lt/css2/img/com-good.jpg" alt="">
</a> <span id="cy_24470645"> </span>
</div>
<div class="comment-vert2">
<a href="javascript:comr(24470645,'m')">
<img src="http://img.lrytas.lt/css2/img/com-bad.jpg" alt="">
</a> <span id="cn_24470645"> </span>
</div>
</div>
</div>
<p class="text-13 no-intend">Test text</p> // I need to get this comments content
</div>
我尝试了很多 xpath 之类的:
*/div[contains(@class, "comment")]/p/text()
/p[contains(@class, "text-13 no-intend")]/text()
etc.
但似乎无法找到它。
不胜感激。
【问题讨论】:
标签: xpath web-crawler