【发布时间】:2011-12-13 17:36:54
【问题描述】:
<div id="descriptionmodule" class="module toggle-wrap">
<div class="mod-header">
<h3 class="toggle-title">Description</h3>
</div>
<div id="issue-description" class="mod-content">
<p>qqqqqqqqqqqqq,<br/>
qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq<br/>
qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq.</p>
<p>qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq</p>
<p>qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq.</p>
<ul class="alternate" type="square">
<li>qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq</li>
</ul>
我只想要 Q 的。我试过这个
doc=lh.fromstring(resp.read())
for id in doc.cssselect('div.mod-content' ):
print id.text_content()
这给了我 q,但它也给了我页面上的其他详细信息以及类 mod-content。 我如何专门得到q。
我正在使用 lxml。
<div id="peoplemodule" class="module toggle-wrap">
<div class="mod-header">
<h3 class="toggle-title">People</h3>
</div>
<div class="mod-content">
<ul class="item-details" id="peopledetails">
<li class="people-details">
<dl>
<dt>Assignee:</dt>
<dd id="Assign-Val">
<a class="user-hover" rel="605794069" id="issue_summary_assignee_605794069" href="--------------"> AAAAAAAAAAAAA a>
</dd>
</dl>
<dl>
<dt>Reporter:</dt>
<dd id="Report-Val">
<a class="user-hover" rel="700843051" id="issue_summary_reporter_700843051" href="-------------------------">BBBBBBBBBBBBBB</a>
</dd>
</dl>
<dl><dt> </dt><dd> </dd></dl>
<dl>
<dt title="Multiple Assignees">Multiple Assignees:</dt>
<dd id="customfield_10020-val"> <div class="shorten" id="customfield_10020-field">
<span class="tinylink"> <a class="user-hover" rel="604810609" id="multiuser_cf_604810609" href------------------">FFFFFFFFFFFFFF</a></span>, <span class="tinylink"> <a class="user-hover" rel="600548483" id="multiuser_cf_600548483" href="------------------------------------">EEEEEEEEEEEEEEEEE</a></span> </div>
</dd>
</dl>
</li>
</ul>
<div id="watchers-val">
<a href="----------------------------------------" id="watching-toggle" rel="858270" title="Start watching this story"><span class="icon icon-watch-off"></span><span class="action-text">Watch</span></a>
(<span id="watcher-data">1</span>)
</div>
</div>
</div>
【问题讨论】:
-
什么“其他细节”?您共享的 sn-p 中只有 q。而且,您的答案很大程度上取决于特定网站的来源。
-
我忘了说,这个sn-p是网页的一小部分,mod-content类也在其他地方使用,因此在打印时,它也会打印其他值。
-
正如我所说,这取决于您感兴趣的网站和内容。您需要为内容提供足够的特异性。例如,如果这是您想要的唯一 div,您可以通过其
id进行选择,因为它应该是唯一的。
标签: python css-selectors lxml