【问题标题】:how can i remove div tag with all child with python and beautifullsoup我如何使用python和beautifulsoup删除所有孩子的div标签
【发布时间】:2019-12-10 02:29:42
【问题描述】:

注意:我的文本中有很多 div,但我只想删除这个特别是包含所有孩子的 div

<div ng-if="comment.repliesCount&amp;&amp;showReplies" class="ng-scope"> 

    <div
        <div



        </div>
    </div>
</div> 

【问题讨论】:

  • 为child.children中的元素试用clear或remove方法:element.clear()

标签: python html text


【解决方案1】:
from simplified_scrapy.simplified_doc import SimplifiedDoc 
html='''
<div>
    <div>test value</div>
    <div ng-if="comment.repliesCount&amp;&amp;showReplies" class="ng-scope"> 
        <div>
            <div>
                noise
            </div>
        </div>
    </div>
</div>
'''
doc = SimplifiedDoc(html)
# if comment.repliesCount&amp;&amp;showReplies is unique, or first appears
html = doc.removeElement('div',attr='ng-if',value='comment.repliesCount&amp;&amp;showReplies')
# if ng-scope is unique, or first appears
html = doc.removeElement('div',attr='class',value='ng-scope')
# If none of the above works, try the following one. 'test value' is a string that can locate the div to be deleted
html = doc.removeElement('div',attr='class',value='ng-scope',start='test value')
print (html)

结果:&lt;div&gt;&lt;div&gt;test value&lt;/div&gt;&lt;/div&gt;

【讨论】:

    【解决方案2】:

    这里是一个例子:

    import re
    
    html='<div ng-if="comment.repliesCount&amp;&amp;showReplies" class="ng-scope"><div><div>HI !</div></div></div>'
    
    def removehtml(html):
      cleanr = re.compile('<.*?>')
      cleantext = re.sub(cleanr, '', html)
      return cleantext
    
    print(removehtml(html)) 
    

    【讨论】:

    • @GillesQuenot 那么?
    • 使用 xpath 检查我的答案
    猜你喜欢
    • 2013-04-18
    • 2014-12-01
    • 2020-03-08
    • 1970-01-01
    • 2020-05-02
    • 2011-12-02
    • 2020-06-28
    • 2016-05-30
    相关资源
    最近更新 更多