如何用空格替换标签 Beautiful Soup答案

【问题标题】：How to replace a tag with space Beautiful Soup如何用空格替换标签 Beautiful Soup
【发布时间】：2013-10-06 00:49:53
【问题描述】：

假设我有

text = """ <a href = 'http://www.crummy.com/software'>Hello There</a>"""

我想用一个空格 (" ") 替换 a href 和 /a。取而代之。顺便说一句，它是一个 BeautifulSoup.BeautifulSoup 类。所以正常的 .replace 是行不通的。

我希望文字只是

""" Hello There """

注意“Hello There”前后的空格。

【问题讨论】：

标签： python html html-parsing beautifulsoup

【解决方案1】：

您可以使用replaceWith()（或replace_with()）：

from bs4 import BeautifulSoup

soup = BeautifulSoup("""
<html>
 <body>
  <a href = 'http://www.crummy.com/software'>Hello There</a>
 </body>
</html>
""")

for a in soup.findAll('a'):
    a.replaceWith(" %s " % a.string)

print soup

打印：

<html><body>
 Hello There 
</body></html>

【讨论】：

【解决方案2】：

使用.replace_with() 和.text 属性：

>>> from bs4 import BeautifulSoup as BS
>>> text = """ <a href = 'http://www.crummy.com/software'>Hello There</a>"""
>>> soup = BS(text)
>>> mytag = soup.find('a')
>>> mytag.replace_with(mytag.text + ' ')
<a href="http://www.crummy.com/software">Hello There</a>
>>> print soup
 Hello There

【讨论】：

【解决方案3】：

 import re
 notag = re.sub("<.*?>", " ", html)
 >>> text = """ <a href = 'http://www.crummy.com/software'>Hello There</a>"""
 >>> notag = re.sub("<.*?>", " ", text)
 >>> notag
 '  Hello There '

看到这个答案：How to remove all html tags from downloaded page

【讨论】：