【问题标题】:How to remove element "<div> <div style" from the soup?如何从汤中删除元素“<div> <div style”?
【发布时间】:2021-07-16 12:20:09
【问题描述】:

我有一个 html,我想在其中删除元素 &lt;div&gt; &lt;div style。我尝试了下面的代码,但无济于事

for d in soup.select('div > div.style'):
    d.extract()

能否请您详细说明如何操作?

from bs4 import BeautifulSoup

texte = """
<div class="content mw-parser-output" id="bodyContent">
    <div>
        <div style="clear:both; background-image:linear-gradient(180deg, #E8E8E8, white); border-top: dashed 2px #AAAAAA; padding: 0.5em 0.5em 0.5em 0.5em; margin-top: 1em; direction: ltr;">
        This article is issued from <a class="external text" href="https://en.wiktionary.org/wiki/?title=Love&amp;oldid=60218267" title="Last edited on 2020-09-02">Wiktionary</a>. The text is licensed under <a class="external text" href="https://creativecommons.org/licenses/by-sa/4.0/">Creative Commons - Attribution - Sharealike</a>. Additional terms may apply for the media files.
        </div>
    </div>
</div>
"""

soup = BeautifulSoup(texte, 'html.parser')

for d in soup.select('div > div.style'):
    d.extract()
    
print(soup.prettify())

我的预期结果是

<div class="content mw-parser-output" id="bodyContent">
</div>

【问题讨论】:

  • 你想只提取文本吗??
  • 您要删除“div”元素或带有样式属性的“div”吗?
  • @qaiser 我已经添加了预期的输出。请查看我的更新。
  • @KokoJumbo 我已经添加了预期的输出。请查看我的更新。

标签: python beautifulsoup


【解决方案1】:

这里

soup = BeautifulSoup(texte, 'html.parser')
soup.select_one('div > div#content').parent.extract()

print(soup.prettify())

【讨论】:

  • 我想施加更严格的条件,即 &lt;div&gt; &lt;div style 是元素 &lt;div class="mw-body" id="content"&gt; 的(不一定是直接的)子元素。你能修改你的代码来解决这个问题吗?
  • @LEAnhDung - 看看我的回答
【解决方案2】:

您可以尝试使用分解方法,Deleting a div with a particlular class using BeautifulSoup

for div in soup.findAll("div", {'style':'clear:both; background-image:linear-gradient(180deg, #E8E8E8, white); border-top: dashed 2px #AAAAAA; padding: 0.5em 0.5em 0.5em 0.5em; margin-top: 1em; direction: ltr;'}):
   div.decompose()

print(soup)
 <div class="content mw-parser-output" id="bodyContent">

</div>

【讨论】:

    【解决方案3】:
    from bs4 import BeautifulSoup
    
    texte = """
    <div class="content mw-parser-output" id="bodyContent">
        <div>
           <div style="clear:both; background-image:linear-gradient(180deg, #E8E8E8, white); border-top:
     dashed 2px #AAAAAA; padding: 0.5em 0.5em 0.5em 0.5em; margin-top: 1em; direction: ltr;">
           This article is issued from <a class="external text" href="https://en.wiktionary.org/wiki/?ti
    tle=Love&amp;oldid=60218267" title="Last edited on 2020-09-02">Wiktionary</a>. The text is licensed u
    nder <a class="external text" href="https://creativecommons.org/licenses/by-sa/4.0/">Creative Commons
     - Attribution - Sharealike</a>. Additional terms may apply for the media files.
           </div>
        </div>
    </div>
    """
    
    soup = BeautifulSoup(texte, 'html.parser')
    
    for d in soup.find_all('div', {"style": True}):
        d.find_parent("div").extract()
    
    print(soup.prettify())
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2017-04-22
      • 1970-01-01
      • 1970-01-01
      • 2013-03-05
      • 1970-01-01
      • 2020-06-17
      相关资源
      最近更新 更多