如何从网页中抓取类的名称？

【问题标题】：how to scrape the name of a class form a web page?如何从网页中抓取类的名称？
【发布时间】：2013-07-15 11:17:09
【问题描述】：

这是我要抓取的网站的 HTML 代码：

<div id="quranOutput">
  <a class="key" name="1:1"></a>
    <div class="verse ayahBox1" id="verse_1">

这是我在动态 django scraper 中使用的 xpath，但它不起作用：

//div[@class="ayah language_6 text"]/a/@name

谁能帮我找出检索名称的正确方法，即 (name="1:1")。

【问题讨论】：

标签： python django dynamic scraper

【解决方案1】：

使用 xpath：

//div[@id="quranOutput"]/a[@class="key"]/@name

>>> import lxml.html
>>> 
>>> root = lxml.html.fromstring('''
... <html>
...     <body>
...         <div id="quranOutput">
...             <a class="key" name="1:1"></a>
...             <div class="verse ayahBox1" id="verse_1"></div>
...         </div>
...     </body>
... </html>''')
>>> 
>>> print root.xpath('//div[@id="quranOutput"]/a[@class="key"]/@name')
['1:1']

【讨论】：