如何从网页抓取功能中删除某些信息（美汤）：答案

【问题标题】：How to remove certain information from webscraping function (Beautiful Soup):如何从网页抓取功能中删除某些信息（美汤）：
【发布时间】：2020-03-05 20:35:51
【问题描述】：

我正在使用 BeautifulSoup 从这个网站上抓取https://lawyers.justia.com/lawyer/michael-paul-ehline-85006

我不希望在我的输出中出现赞助商列表：

我的代码：

for o in soup.findAll('div', attrs={"class": "block-wrapper"}): 
    for de in o.findAll("li"):
        if de != []:
            de=remove_tags(str(de))
            print (de)

python 中的输出： OUTPUT IMAGE

【问题讨论】：

标签： python web beautifulsoup screen-scraping

【解决方案1】：

您可以删除 HTML 页面中的某些内容。使用 findAll('div', attrs={"class": "primary-sidebar-wrapper"}) 找到所需的元素后。您可以执行以下操作：

tag = soup.findAll('div', attrs={"class": "block-wrapper"})
tag[0].replace_with("")

这也替换了变量汤中的 HTML 文本

【讨论】：