使用python抓取html文本答案

【问题标题】：scraping html text using python使用python抓取html文本
【发布时间】：2020-07-22 14:17:52
【问题描述】：

我怎样才能只从下面的 html 中得到 Rodger Federer 这个词

<div class="profile-heading--desktop"><h1><span class="profile-heading__rank">#1 </span>Roger Federer</h1><div class="profile-subheading">Athlete, Tennis</div></div>

我正在使用此代码

name = soup.find(class_ = 'profile-heading__rank').get_text()

而且越来越 #1

【问题讨论】：

如果您使用的代码是 Python，则值得将其（以及适当的版本）添加为标签。

标签： html css python-3.x beautifulsoup

【解决方案1】：

使用.next_sibling 获取文本在<h1> 旁边：

from bs4 import BeautifulSoup

html = """
<div class="profile-heading--desktop">
    <h1>
        <span class="profile-heading__rank">#1 </span>
        Roger Federer
    </h1>
    <div class="profile-subheading">
        Athlete, Tennis
    </div>
</div>
"""

soup = BeautifulSoup(html, 'html.parser')
name = soup.find(class_='profile-heading__rank').next_sibling

print(name)  # -->  Roger Federer

【讨论】：

【解决方案2】：

另一种方法是找到h1后使用.find(text=True, recursive=False)：

from bs4 import BeautifulSoup

html = '<div class="profile-heading--desktop"><h1><span class="profile-heading__rank">#1 </span>Roger Federer</h1><div class="profile-subheading">Athlete, Tennis</div></div>'
soup = BeautifulSoup(html, 'html.parser')
print(soup.find('h1').find(text=True, recursive=False))

输出：

Roger Federer

【讨论】：