使用 BS4 - 如何只获取文本，而不是标签？答案

【问题标题】：Using BS4 - how to get text only, and not tags?使用 BS4 - 如何只获取文本，而不是标签？
【发布时间】：2026-02-19 12:50:01
【问题描述】：

我正在尝试在https://www.formularylookup.com/ 上为一些公司抓取有关医药和市场资产的页面

下面的代码为我提供了所需的数据，如计划数量、哪些药店正在承保该药物以及以 % 为单位的状态。这是我的输出示例，其中所需的输出只是“1330 个计划”：

计划数量：

<td class="plan-count" role="gridcell">1330 plans</td>

我尝试在每个 tag.find 之后使用 .text，但它不起作用。

这是我关于这个特定部分的代码。上面还有很多内容，但其中包括我无法分享的登录信息。

total = []

soup = BeautifulSoup(html, "lxml")

for tag in soup.find_all("tbody", {"role":"rowgroup"}):
    #name = tag.find("td", {"class":"payer-name"}) #gives me whole tag
    name = tag.find("tr", {"role":"row"}).find("td").get("payer-name") #gives me None output
    plan = tag.find("td", {"class":"plan-count"})  #gives me whole tag
    stat = tag.find("td", {"class":"icon-status"}) #gives me whole tag

    data = {"Payer": name, "Number of plans": plan, "Status": stat}

    total.append(data)

df = pd.DataFrame(total)
print(df)

这是一个使用检查功能的 sn-p。

<tbody role="rowgroup">
    <tr data-uid="a5795205-1518-4a74-b039-abcd1b35b409" role="row">
        <td class="payer-name" role="gridcell">CVS Caremark RX</td>
        <td class="plan-count" role="gridcell">1330 plans</td>
        <td role="gridcell" class="icon-status icon-status-not-covered">98% Not Covered</td>
     </tr>

编辑：在深入研究 SO 之后，我看到 solution 可能正在使用 BS4 的内容功能。如果它有效，将报告。 - 这不起作用： "AttributeError: 'NoneType' 对象没有属性 'contents'"

【问题讨论】：

也许我可以使用内容功能？
没用。

标签： python-3.x web-scraping beautifulsoup

【解决方案1】：

我想通了。显然还有其他以 tbody rowgroup 开头的标签，它们被归类为无，因此在我的代码到达我想要的部分之前，不可能获得这些标签的 .text 。

我只需要改变这一行：

for tag in soup.find_all("tbody", {"role":"rowgroup"}):

【讨论】：

我似乎仍然得到 1 个空行，是第一个。我需要弄清楚如何摆脱它，在那之前 .text 将不起作用。