【问题标题】:How can I get the second span using BeautifulSoup in python?如何在 python 中使用 BeautifulSoup 获得第二个跨度?
【发布时间】:2020-01-19 06:27:26
【问题描述】:

我正在尝试获取此 div 中的第二个跨度值和其他类似的值(如下所示)

<div class="C(#959595) Fz(11px) D(ib) Mb(6px)">
    <span>VALUE 1</span>
    <i aria-hidden="true" class="Mx(4px)">•</i>
    <span>TRYING TO GET THIS</span>
</div>

我试过查看类似的堆栈帖子,但我仍然不知道如何解决这个问题。 这是我当前的代码:

time = soup.find_all('div', {'class': 'C(#959595) Fz(11px) D(ib) Mb(6px)'})
    for i in time:
        print(i.text) #this prints VALUE 1 x amount of times (there are multiple divs)

我尝试过 i.span、i.contents、i.children 等。 非常感谢任何帮助,谢谢!

【问题讨论】:

  • 您查阅过 BeatifulSoup 文档吗?

标签: python web-scraping beautifulsoup html-parsing


【解决方案1】:

试试这个

from io import StringIO
from bs4 import BeautifulSoup as bs

data = """<div class="C(#959595) Fz(11px) D(ib) Mb(6px)">
    <span>VALUE 1</span>
    <i aria-hidden="true" class="Mx(4px)">•</i>
    <span>TRYING TO GET THIS</span>
</div>
<div class="another class">
    <span>VALUE 1</span>
    <i aria-hidden="true" class="Mx(4px)">•</i>
    <span>TRYING TO GET THIS</span>
</div>"""

soup = bs(StringIO(data))
spans = soup.select('div[class="C(#959595) Fz(11px) D(ib) Mb(6px)"] > span')
print(spans[1].text)

【讨论】:

    【解决方案2】:

    你基本上已经有了,你只需要在每个 div (find_next) 中获取第二个跨度:

    soup = BeautifulSoup(HTML, 'html.parser')
    divs = soup.find_all('div', {'class': 'C(#959595) Fz(11px) D(ib) Mb(6px)'})
    for div in divs:
        # want the second span in the div
        span = div.find_next('span').find_next('span')
        print(span.string)
    

    【讨论】:

    • 有没有像“doc.find_last('span')”这样更干净的东西
    • span = div.find_all('span').pop()
    【解决方案3】:

    有几种方法可以获得你想要的价值。

    from simplified_scrapy.simplified_doc import SimplifiedDoc
    html='''
    <div class="C(#959595) Fz(11px) D(ib) Mb(6px)">
        <span>VALUE 1</span>
        <i aria-hidden="true" class="Mx(4px)">•</i>
        <span>TRYING TO GET THIS</span>
    </div>
    '''
    doc = SimplifiedDoc(html)
    divs = doc.getElementsByClass('C(#959595) Fz(11px) D(ib) Mb(6px)')
    for div in divs:
      value = div.getElementByTag('span',start='</span>') # Use start to skip the first
      print (value)
      value = div.getElementByTag('span',before='<span>',end=len(div.html)) # Locate the last
      print (value)
      value = div.i.next # Use <i> to locate
      print (value)
      value = div.spans[-1]
      print (value)
      print (value.text)
    

    结果:

    {'tag': 'span', 'html': 'TRYING TO GET THIS'}
    {'tag': 'span', 'html': 'TRYING TO GET THIS'}
    {'tag': 'span', 'html': 'TRYING TO GET THIS'}
    {'tag': 'span', 'html': 'TRYING TO GET THIS'}
    TRYING TO GET THIS
    

    【讨论】:

      【解决方案4】:
      div= soup.find_all('div',class_='C(#959595) Fz(11px) D(ib) Mb(6px)')
      [x.get_text() for x in div[0].find_all('span')]
      
      #op
      
      Out[17]:
      ['VALUE 1', 'TRYING TO GET THIS']
      

      【讨论】:

        猜你喜欢
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 2021-11-13
        • 1970-01-01
        • 1970-01-01
        • 2013-07-26
        相关资源
        最近更新 更多