使用 lxml 检索类属性的名称答案

【问题标题】：Retrieving the name of a class attribute with lxml使用 lxml 检索类属性的名称
【发布时间】：2016-05-02 06:43:45
【问题描述】：

我正在开发一个使用 lxml 废弃页面的 python 项目，并且我面临着检索跨度类属性名称的挑战。 html sn-p 如下：

<tr class="nogrid">
  <td class="date">12th January 2016</td> 
  <td class="time">11:22pm</td> 
  <td class="category">Clothing</td>   
  <td class="product">
    <span class="brand">carlos santos</span>
  </td> 
  <td class="size">10</td> 
  <td class="name">polo</td> 
</tr>
....

如何在下面检索 span 的类属性的值：

<span class="brand">carlos santos</span>

【问题讨论】：

标签： python html lxml

【解决方案1】：

from bs4 import BeautifulSoup

lxml = '''<tr class="nogrid">
          <td class="date">12th January 2016</td> 
          <td class="time">11:22pm</td> 
          <td class="category">Clothing</td>   
          <td class="product">
            <span class="brand">carlos santos</span>
          </td> 
          <td class="size">10</td> 
          <td class="name">polo</td> 
          <tr>'''
soup = BeautifulSoup(lxml, 'lxml')
result = soup.find('span')['class'] # result = 'brand'

【讨论】：

如果我使用的是 BeautifulSoup，我会将此作为我的答案。谢谢亚历山大

【解决方案2】：

您可以使用以下 XPath 获取 span 元素的 class 属性，该元素是 td 的直接子元素，类 product ：

//td[@class="product"]/span/@class

工作演示示例：

from lxml import html
raw = '''<tr class="nogrid">
<td class="date">12th January 2016</td> 
<td class="time">11:22pm</td> 
<td class="category">Clothing</td>   
<td class="product">
<span class="brand">carlos santos</span>
</td> 
<td class="size">10</td> 
<td class="name">polo</td> 
</tr>'''

root = html.fromstring(raw)
span = root.xpath('//td[@class="product"]/span/@class')[0]
print span

输出：

Brand

【讨论】：