【问题标题】:Beautiful soup extract text between span tags跨度标签之间美丽的汤提取文本
【发布时间】:2018-10-12 16:13:59
【问题描述】:
<span id="priceblock_dealprice" class="a-size-medium a-color-price"><span class="currencyINR">&nbsp;&nbsp;</span> 33,990.00 </span>

我需要从上面的 html 中提取数字 33,990.00。

【问题讨论】:

    标签: python beautifulsoup


    【解决方案1】:

    用美丽的汤:

    from bs4 import BeautifulSoup as bs
    
    content = '''<span id="priceblock_dealprice" class="a-size-medium a-color-price"><span class="currencyINR">&nbsp;&nbsp;</span> 33,990.00 </span>'''
    
    soup = bs(content,'html5lib')
    print(soup.text.strip())
    

    【讨论】:

      【解决方案2】:

      为什么使用selenium?这太没必要了。如果页面是 JavaScript 呈现的,则仅使用 selenium。否则使用以下内容:

      from bs4 import BeautifulSoup
      html = '<span id="priceblock_dealprice" class="a-size-medium a-color-price"><span class="currencyINR">&nbsp;&nbsp;</span> 33,990.00 </span>'
      soup = BeautifulSoup(html, 'lxml')
      text = soup.select_one('span.a-color-price').text.strip()
      

      输出:

      33,990.00
      

      【讨论】:

        【解决方案3】:

        这是selenium的好工作:

        from selenium import webdriver
        from selenium.webdriver.common.by import By
        from selenium.webdriver.support.ui import WebDriverWait
        from selenium.webdriver.support import expected_conditions as EC
        
        browser = webdriver.Firefox()
        
        browser.get(URL)
        
        delay = 30  # seconds
        WebDriverWait(browser, delay).until(EC.presence_of_element_located((By.ID, 'priceblock_dealprice')))
        print("Page is ready!")
        
        text = browser.find_element_by_id("priceblock_dealprice").text
        

        【讨论】:

        • 使用硒不是用霰弹枪杀死苍蝇吗? ;)
        猜你喜欢
        • 2018-12-09
        • 1970-01-01
        • 2017-05-29
        • 2021-09-08
        • 1970-01-01
        • 2021-05-24
        • 1970-01-01
        • 2019-09-02
        • 1970-01-01
        相关资源
        最近更新 更多