【问题标题】:How to replace a tag with space Beautiful Soup如何用空格替换标签 Beautiful Soup
【发布时间】:2013-10-06 00:49:53
【问题描述】:

假设我有

text = """ <a href = 'http://www.crummy.com/software'>Hello There</a>"""

我想用一个空格 (" ") 替换 a href 和 /a。取而代之。顺便说一句,它是一个 BeautifulSoup.BeautifulSoup 类。所以正常的 .replace 是行不通的。

我希望文字只是

""" Hello There """

注意“Hello There”前后的空格。

【问题讨论】:

    标签: python html html-parsing beautifulsoup


    【解决方案1】:

    您可以使用replaceWith()(或replace_with()):

    from bs4 import BeautifulSoup
    
    soup = BeautifulSoup("""
    <html>
     <body>
      <a href = 'http://www.crummy.com/software'>Hello There</a>
     </body>
    </html>
    """)
    
    for a in soup.findAll('a'):
        a.replaceWith(" %s " % a.string)
    
    print soup
    

    打印:

    <html><body>
     Hello There 
    </body></html>
    

    【讨论】:

      【解决方案2】:

      使用.replace_with().text 属性:

      >>> from bs4 import BeautifulSoup as BS
      >>> text = """ <a href = 'http://www.crummy.com/software'>Hello There</a>"""
      >>> soup = BS(text)
      >>> mytag = soup.find('a')
      >>> mytag.replace_with(mytag.text + ' ')
      <a href="http://www.crummy.com/software">Hello There</a>
      >>> print soup
       Hello There 
      

      【讨论】:

        【解决方案3】:
         import re
         notag = re.sub("<.*?>", " ", html)
         >>> text = """ <a href = 'http://www.crummy.com/software'>Hello There</a>"""
         >>> notag = re.sub("<.*?>", " ", text)
         >>> notag
         '  Hello There '
        

        看到这个答案:How to remove all html tags from downloaded page

        【讨论】:

          猜你喜欢
          • 1970-01-01
          • 1970-01-01
          • 2022-08-11
          • 1970-01-01
          • 1970-01-01
          • 1970-01-01
          • 2013-11-03
          • 2019-02-20
          • 2011-01-04
          相关资源
          最近更新 更多