用beautifulsoup解析小html代码的pythonic方法？答案

【问题标题】：The pythonic way to parse a small html code with beautifulsoup?用beautifulsoup解析小html代码的pythonic方法？
【发布时间】：2018-08-10 13:28:37
【问题描述】：

使用 BeautifulSoup 解析以下 html 代码的最佳 Python 方法是什么？

<html>

<body>
  <div class="bet_group">
    <div class="bet-title bet-title_justify"><span class="bet-title__star"></span> Total
      <!-- -->
    </div>
    <div class="bets betCols2">
      <div class=""><span class="bet_type" data-type="9">Total Over 4.5</span> <span class="koeff" data-coef="3.38"><i>3.38</i></span></div>
      <div class=""><span class="bet_type" data-type="10">Total Under 4.5</span> <span class="koeff" data-coef="1.34"><i>1.34</i></span></div>
      <div class=""><span class="bet_type" data-type="9">Total Over 5.5</span> <span class="koeff" data-coef="12.5"><i>12.5</i></span></div>
      <div class=""><span class="bet_type" data-type="10">Total Under 5</span> <span class="koeff" data-coef="1.04"><i>1.04</i></span></div>
      <div class="bets__empty-cell"> </div>
      <div class=""><span class="bet_type" data-type="10">Total Under 5.5</span> <span class="koeff" data-coef="1.02"><i>1.02</i></span></div>
    </div>
  </div>
</body>

</html>

我正在尝试获取输出：

Title: Total

Total Over 4.5: 3.88, Total Under 4.5: 1.34

Total Over 5.5: 12.5, Total Under 4.5: 1.02

我已尝试使用以下代码，但它并没有完全实现。

soup = BeautifulSoup(html, 'lxml')

infos = soup.find_all('span', class_='bet_type')
for info in infos:
    info.get_text()
odds = soup.find_all('span', class_='koeff')
for odd in odds:
    odd.get_text()

【问题讨论】：

标签： python beautifulsoup

【解决方案1】：

试试：

from bs4 import BeautifulSoup
soup = BeautifulSoup(html, 'lxml')
output = ""
for i in soup.find("div", class_="bet_group").text.splitlines():
    if i.strip():
        output += i.strip()+"\n"
print(output)

输出：

Total
Total Over 4.5 3.38
Total Under 4.5 1.34
Total Over 5.5 12.5
Total Under 5 1.04
Total Under 5.5 1.02

【讨论】：

Rkesh 非常感谢！这正是我想要的！
通常当我像你一样使用soup.find() 时，我只得到第一个找到的元素，你的代码怎么会找到它们？
对于您的示例 html，您可以在父 div 上 find....使用 find 和 find_all 取决于您要废弃的 html 结构 :)

【解决方案2】：

可能对你有帮助，

    st = """
        <html>

<body>
  <div class="bet_group">
    <div class="bet-title bet-title_justify"><span class="bet-title__star"></span> Total
      <!-- -->
    </div>
    <div class="bets betCols2">
      <div class=""><span class="bet_type" data-type="9">Total Over 4.5</span> <span class="koeff" data-coef="3.38"><i>3.38</i></span></div>
      <div class=""><span class="bet_type" data-type="10">Total Under 4.5</span> <span class="koeff" data-coef="1.34"><i>1.34</i></span></div>
      <div class=""><span class="bet_type" data-type="9">Total Over 5.5</span> <span class="koeff" data-coef="12.5"><i>12.5</i></span></div>
      <div class=""><span class="bet_type" data-type="10">Total Under 5</span> <span class="koeff" data-coef="1.04"><i>1.04</i></span></div>
      <div class="bets__empty-cell"> </div>
      <div class=""><span class="bet_type" data-type="10">Total Under 5.5</span> <span class="koeff" data-coef="1.02"><i>1.02</i></span></div>
    </div>
  </div>
</body>

</html>
    """
    soup = BeautifulSoup(st, 'lxml')
    title = soup.find('div', attrs={'class': 'bet-title'}).get_text().strip()
    print(title)
    for spn in soup.find_all('span', attrs={'class': 'bet_type'}):
        bet_text = spn.get_text()
        print(bet_text)


    # Output as: Total
    #            Total Over 4.5
    #            Total Under 4.5
    #            Total Over 5.5
    #            Total Under 5
    #            Total Under 5.5

【讨论】：