【发布时间】:2017-10-05 00:07:35
【问题描述】:
我已经用这个 html 内容抓取了一个页面:
<div class="td-ss-main-content">
<div class="td-page-header">...</div>
<div class="td_module_16 td_module_wrap td-animation-stack">...</div>
<div class="td_module_16 td_module_wrap td-animation-stack td_module_no_thumb">...</div>
<div class="page-nav td-pb-padding-side">
<span class="current">1</span>
<a href="http://www.arunachaltimes.in/2017/05/06/page/2/" class="page" title="2">2</a>
<a href="http://www.arunachaltimes.in/2017/05/06/page/3/" class="page" title="3">3</a>
<a href="http://www.arunachaltimes.in/2017/05/06/page/2/"><i class="td-icon-menu-right"></i></a>
<span class="pages">Page 1 of 3</span>
</div>
</div>
现在我想获取下一页链接,如果它存在于.page-nav > a 的a href 值中,它有一个i tag。
我可以这样做:
response.css("div.page-nav > a")[2].css("::attr(href)").extract_first()
但是,如果我在第 2 页,这将不起作用。因此,如果 a tag 具有 i tag 的子元素,则最好获取它的值。我怎样才能做到这一点?
更新(第 2 页)
<div class="page-nav td-pb-padding-side">
<a href="http://www.arunachaltimes.in/2017/05/06/"><i class="td-icon-menu-left"></i></a>
<a href="http://www.arunachaltimes.in/2017/05/06/" class="page" title="1">1</a>
<span class="current">2</span>
<a href="http://www.arunachaltimes.in/2017/05/06/page/3/" class="page" title="3">3</a>
<a href="http://www.arunachaltimes.in/2017/05/06/page/3/"><i class="td-icon-menu-right"></i></a>
<span class="pages">Page 2 of 3</span>
</div>
更新(第 3 页最后一页)
<div class="page-nav td-pb-padding-side">
<a href="http://www.arunachaltimes.in/2017/05/06/page/2/"><i class="td-icon-menu-left"></i></a>
<a href="http://www.arunachaltimes.in/2017/05/06/" class="page" title="1">1</a>
<a href="http://www.arunachaltimes.in/2017/05/06/page/2/" class="page" title="2">2</a>
<span class="current">3</span>
<span class="pages">Page 3 of 3</span>
</div>
【问题讨论】: