python beautifulsoup提取标签之间出现的次数答案

【问题标题】：python beautifulsoup extract number of appearance between tagpython beautifulsoup提取标签之间出现的次数
【发布时间】：2017-03-13 04:30:38
【问题描述】：

我想提取网页中标签之间的“file it”数量。这是我的代码。

from urllib.request import urlopen
from bs4 import BeautifulSoup

html = urlopen("https://www.crummy.com/software/BeautifulSoup/")
bsObj = BeautifulSoup(html, "html.parser")

nameList = bsObj.findAll(text="file it")
print(len(nameList))

在“归档”或“下载”的情况下，它与结果 1 配合得很好。在“名人堂”的情况下，它与结果2配合得很好。

但在“讨论组”的情况下，它应该是2，但它不起作用，结果是0。

为什么我在“讨论组”案例或“获取源代码”案例中得到结果 0？

【问题讨论】：

如果你看页面源码，"the discussion\ngroup"之间有一个换行符。

标签： python tags beautifulsoup extract

【解决方案1】：

import re
nameList = bsObj.findAll(text=re.compile(r"the\s+discussion\sgroup"))

在正则表达式中使用\s+ 匹配所有空格，包括\n

【讨论】：