从 html span 中检索内容字段答案

【问题标题】：Retrieve content field from html span从 html span 中检索内容字段
【发布时间】：2019-03-05 00:24:40
【问题描述】：

我在一个对象中有以下 html 代码：

<span itemprop="price" content="187">187,00&nbsp;€</span>

我的想法是获取 span 对象的内容（价格）。为此，我正在执行以下操作：

import requests
from lxml import html

tree = html.fromstring(res.content)
prices = tree.xpath('//span[@class="price"]/text()')
print(float(prices[0].split()[0].replace(',','.')))

这里，res.content 包含在上面显示的 span 对象中。如您所见，我从187,00&nbsp;€（经过一些修改）获得价格，而从跨度内的“内容”标签中获取价格会更容易。我试过使用：

tree.xpath('//span[@class="price"]/content()')

但它不起作用。有没有办法检索这些数据？我愿意使用任何其他库。

【问题讨论】：

标签： python html web-scraping

【解决方案1】：

您可以使用BeautifulSoup 库进行html解析：

from bs4 import BeautifulSoup as soup
d = soup('<span itemprop="price" content="187">187,00&nbsp;€</span>', 'html.parser')
content = d.find('span')['content']

输出：

'187'

为了更具体的事件，您可以提供itemprop 值：

content = d.find('span', {'itemprop':'price'})['content']

要获取标签之间的内容，请使用soup.text：

content = d.find('span', {'itemprop':'price'}).text

输出：

'187,00\xa0€'

【讨论】：

是的。这就是我一直在寻找的。我发现使用BeautifulSoup 的代码比仅使用requests 和html 更好。还有一个问题，如果我现在想得到187,00&nbsp;€ 怎么办。我应该放什么而不是['content']？我的意思是，获取 div/span 的实际内容。
@luis.galdo 您可以使用soup.text。请查看我最近的编辑。

【解决方案2】：

你可以试试

prices = tree.xpath('//span[@class="price"]')
for price in prices:
    print(price.get("content"))

【讨论】：

这也有效，但我决定给出另一个答案是正确的，因为它更适合我的情况！