【问题标题】:Replace contents of p tag with Beautifulsoup用 Beautifulsoup 替换 p 标签的内容
【发布时间】:2021-10-10 06:56:48
【问题描述】:

我在替换内容时遇到了问题,当 html 包含以下内容时会出现问题:

    <p>Next, go to your <strong>/home/pi</strong> directory and check if you can see the picture</p>

我想做的是替换 p 标签的内容,同时丢弃任何额外的样式,或里面的标签。在这个例子中,这意味着强标签将不再是新字符串的一部分。

但是,我发现完全替换 p 标签的内容是不可能的。我已经用谷歌搜索了我的问题/错误,但无法提出一个可行的示例。

这是我的代码和我尝试运行的测试,有些会抛出错误,有些则根本不做任何事情。您可以取消引用其中任何一个来自己测试它,但结果已经附加在 cmets 中。


from bs4 import BeautifulSoup

src = "<p>Next, go to your <strong>/home/pi</strong> directory and check if you can see the picture</p>"

soup=BeautifulSoup(src, "lxml")

for element in soup.findAll():
    if element.name == 'p':
        print(element) 
        #= <p>Next, go to your <strong>/home/pi</strong> directory and check if you can see the picture</p>
        print(element.text) 
        #= Next, go to your /home/pi/ directory and check if you can see the picture
        print(element.contents) 
        #= ['Next, go to your ', <strong>/home/pi</strong>, ' directory and check if you can see the picture']

        # -- test 1:
        # element.string.replace_with("First, go to your /home/pi directory")
        # AttributeError: 'NoneType' object has no attribute 'replace_with'

        # -- test 2:
        # element.replace("First, go to your /home/pi directory")
        # TypeError: 'NoneType' object is not callable

        # -- test 3:
        # new_tag = soup.new_tag('li')
        # new_tag.string = "First, go to your /home/pi directory"
        # element.replace_with(new_tag)
        # print(element)
        # not replaced

        # -- test 4:
        # element.text.replace(str(element), "First, go to your /home/pi directory")
        # print(element)
        # not replaced

        # -- test 5:
        # element.text.replace(element.text, "First go to your /home/pi/ directory")
        # print(element)
        # not replaced

        # -- test 6:
        new_tag = soup.new_tag('li')
        new_tag.string = "First, go to your /home/pi directory"
        element.replaceWith(new_tag)
        print(element)
        # not replaced

        # -- test 7:
        # element.replace_with("First, go to your /home/pi directory")
        # print(element)
        # not replaced

我怀疑问题是由于element.contents 包含多个项目而发生的。但是,element.text 为我提供了处理和替换字符串所需的内容,我不关心里面的任何样式。

作为最后的手段,我将招待str.replace'ing 格式化html 中的元素,但如果可能的话,我更愿意在BeautifulSoup 中处理它。

使用的来源:

https://www.tutorialfor.com/questions-59179.htmhttps://beautiful-soup-4.readthedocs.io/en/latest/#modifying-the-treehttps://www.crummy.com/software/BeautifulSoup/bs4/doc/#making-the-souphttps://www.crummy.com/software/BeautifulSoup/bs4/doc/#replace-withhttps://www.crummy.com/software/BeautifulSoup/bs4/doc/#method-names

AttributeError: 'NoneType' object has no attribute 'replace_with'

https://stackoverflow.com/a/25722910/8623540

【问题讨论】:

    标签: python html beautifulsoup html-parsing


    【解决方案1】:

    我认为你可以简单地用= 声明element.string。不用.replace()

    from bs4 import BeautifulSoup
    
    src = "<p>Next, go to your <strong>/home/pi</strong> directory and check if you can see the picture</p>"
    
    soup=BeautifulSoup(src, "html.parser")
    print ('Original: %s' %soup)
    
    for element in soup.findAll():
        if element.name == 'p':
            element.string = "First, go to your /home/pi directory"
    
    print('Altered: %s' %soup)
    

    输出:

    Original: <p>Next, go to your <strong>/home/pi</strong> directory and check if you can see the picture</p>
    Altered: <p>First, go to your /home/pi directory</p>
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2014-06-19
      • 2021-08-01
      • 2014-10-02
      • 1970-01-01
      • 1970-01-01
      • 2022-12-08
      相关资源
      最近更新 更多