【问题标题】:Trying to replace Tag <em> with <a>试图用 <a> 替换标签 <em>
【发布时间】:2026-01-20 12:45:01
【问题描述】:
import requests
import string
from bs4 import BeautifulSoup, Tag
[...]
def disease_spider(maxpages):
    i = 0
while i <= maxpages:
    url = 'http://www.cdc.gov/DiseasesConditions/az/'+ alpha[i]+'.html'
    source_code = requests.get(url)
    plain_text = source_code.text
    soup = BeautifulSoup(plain_text)
    for l in soup.findAll('a', {'class':'noLinking'}):
        x =l.find("em")
        if x is not None:
            return x.em.replaceWith(Tag('a'))

    i += 1

网站上的一些文本使用标签而不是标签,我想用标签替换它们。 使用此代码我收到此错误:

AttributeError: 'NoneType' 对象没有属性 'replaceWith'

【问题讨论】:

    标签: python tags beautifulsoup replacewith


    【解决方案1】:

    据我了解,您想用它的文本替换 em

    换句话说,a 元素包含:

    <a class="noLinking" href="http://www.cdc.gov/hi-disease/index.html">
        including Hib Infection (<em>Haemophilus influenzae</em> Infection)   
    </a>
    

    应替换为:

    <a class="noLinking" href="http://www.cdc.gov/hi-disease/index.html">
        including Hib Infection (Haemophilus influenzae Infection) 
    </a>
    

    在这种情况下,我会直接在a 标签下找到所有em 标签,并且对于找到的每个em 标签,使用replace_with() 将其替换为它的文本:

    for em in soup.select('a.noLinking > em'):
        em.replace_with(em.text)
    

    附带说明,可能不需要替换,因为a 标记的.text 将为您提供节点的全文,包括它的子节点:

    In [1]: from bs4 import BeautifulSoup
    
    In [2]: data = """
       ...:     <a class="noLinking" href="http://www.cdc.gov/hi-disease/index.html">
       ...:         including Hib Infection (<em>Haemophilus influenzae</em> Infection)   
       ...:     </a>
       ...: """
    
    In [3]: soup = BeautifulSoup(data)
    
    In [4]: print soup.a.text
    
            including Hib Infection (Haemophilus influenzae Infection)   
    

    【讨论】:

    • 能找到list标签里的所有标签吗?
    • @ks4929 是的。例如,将a.noLinking &gt; em 替换为li a.noLinking &gt; em