跨度标签之间美丽的汤提取文本答案

【问题标题】：Beautiful soup extract text between span tags跨度标签之间美丽的汤提取文本
【发布时间】：2018-10-12 16:13:59
【问题描述】：

<span id="priceblock_dealprice" class="a-size-medium a-color-price"><span class="currencyINR">&nbsp;&nbsp;</span> 33,990.00 </span>

我需要从上面的 html 中提取数字 33,990.00。

【问题讨论】：

标签： python beautifulsoup

【解决方案1】：

用美丽的汤：

from bs4 import BeautifulSoup as bs

content = '''<span id="priceblock_dealprice" class="a-size-medium a-color-price"><span class="currencyINR">&nbsp;&nbsp;</span> 33,990.00 </span>'''

soup = bs(content,'html5lib')
print(soup.text.strip())

【讨论】：

【解决方案2】：

为什么使用selenium？这太没必要了。如果页面是 JavaScript 呈现的，则仅使用 selenium。否则使用以下内容：

from bs4 import BeautifulSoup
html = '<span id="priceblock_dealprice" class="a-size-medium a-color-price"><span class="currencyINR">&nbsp;&nbsp;</span> 33,990.00 </span>'
soup = BeautifulSoup(html, 'lxml')
text = soup.select_one('span.a-color-price').text.strip()

输出：

33,990.00

【讨论】：

【解决方案3】：

这是selenium的好工作：

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

browser = webdriver.Firefox()

browser.get(URL)

delay = 30  # seconds
WebDriverWait(browser, delay).until(EC.presence_of_element_located((By.ID, 'priceblock_dealprice')))
print("Page is ready!")

text = browser.find_element_by_id("priceblock_dealprice").text

【讨论】：

使用硒不是用霰弹枪杀死苍蝇吗？ ;)