【问题标题】:The pythonic way to parse a small html code with beautifulsoup?用beautifulsoup解析小html代码的pythonic方法?
【发布时间】:2018-08-10 13:28:37
【问题描述】:

使用 BeautifulSoup 解析以下 html 代码的最佳 Python 方法是什么?

<html>

<body>
  <div class="bet_group">
    <div class="bet-title bet-title_justify"><span class="bet-title__star"></span> Total
      <!-- -->
    </div>
    <div class="bets betCols2">
      <div class=""><span class="bet_type" data-type="9">Total Over 4.5</span> <span class="koeff" data-coef="3.38"><i>3.38</i></span></div>
      <div class=""><span class="bet_type" data-type="10">Total Under 4.5</span> <span class="koeff" data-coef="1.34"><i>1.34</i></span></div>
      <div class=""><span class="bet_type" data-type="9">Total Over 5.5</span> <span class="koeff" data-coef="12.5"><i>12.5</i></span></div>
      <div class=""><span class="bet_type" data-type="10">Total Under 5</span> <span class="koeff" data-coef="1.04"><i>1.04</i></span></div>
      <div class="bets__empty-cell"> </div>
      <div class=""><span class="bet_type" data-type="10">Total Under 5.5</span> <span class="koeff" data-coef="1.02"><i>1.02</i></span></div>
    </div>
  </div>
</body>

</html>

我正在尝试获取输出:

Title: Total

Total Over 4.5: 3.88, Total Under 4.5: 1.34

Total Over 5.5: 12.5, Total Under 4.5: 1.02

我已尝试使用以下代码,但它并没有完全实现。

soup = BeautifulSoup(html, 'lxml')

infos = soup.find_all('span', class_='bet_type')
for info in infos:
    info.get_text()
odds = soup.find_all('span', class_='koeff')
for odd in odds:
    odd.get_text()

【问题讨论】:

    标签: python beautifulsoup


    【解决方案1】:

    试试:

    from bs4 import BeautifulSoup
    soup = BeautifulSoup(html, 'lxml')
    output = ""
    for i in soup.find("div", class_="bet_group").text.splitlines():
        if i.strip():
            output += i.strip()+"\n"
    print(output)
    

    输出:

    Total
    Total Over 4.5 3.38
    Total Under 4.5 1.34
    Total Over 5.5 12.5
    Total Under 5 1.04
    Total Under 5.5 1.02
    

    【讨论】:

    • Rkesh 非常感谢!这正是我想要的!
    • 通常当我像你一样使用soup.find() 时,我只得到第一个找到的元素,你的代码怎么会找到它们?
    • 对于您的示例 html,您可以在父 div 上 find....使用 findfind_all 取决于您要废弃的 html 结构 :)
    【解决方案2】:

    可能对你有帮助,

        st = """
            <html>
    
    <body>
      <div class="bet_group">
        <div class="bet-title bet-title_justify"><span class="bet-title__star"></span> Total
          <!-- -->
        </div>
        <div class="bets betCols2">
          <div class=""><span class="bet_type" data-type="9">Total Over 4.5</span> <span class="koeff" data-coef="3.38"><i>3.38</i></span></div>
          <div class=""><span class="bet_type" data-type="10">Total Under 4.5</span> <span class="koeff" data-coef="1.34"><i>1.34</i></span></div>
          <div class=""><span class="bet_type" data-type="9">Total Over 5.5</span> <span class="koeff" data-coef="12.5"><i>12.5</i></span></div>
          <div class=""><span class="bet_type" data-type="10">Total Under 5</span> <span class="koeff" data-coef="1.04"><i>1.04</i></span></div>
          <div class="bets__empty-cell"> </div>
          <div class=""><span class="bet_type" data-type="10">Total Under 5.5</span> <span class="koeff" data-coef="1.02"><i>1.02</i></span></div>
        </div>
      </div>
    </body>
    
    </html>
        """
        soup = BeautifulSoup(st, 'lxml')
        title = soup.find('div', attrs={'class': 'bet-title'}).get_text().strip()
        print(title)
        for spn in soup.find_all('span', attrs={'class': 'bet_type'}):
            bet_text = spn.get_text()
            print(bet_text)
    
    
        # Output as: Total
        #            Total Over 4.5
        #            Total Under 4.5
        #            Total Over 5.5
        #            Total Under 5
        #            Total Under 5.5
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 2011-06-13
      • 2012-12-13
      • 2020-10-15
      • 2018-09-12
      • 1970-01-01
      • 2013-03-10
      • 2020-02-06
      相关资源
      最近更新 更多