【问题标题】:python: adding an xml sub-element to a substring of a parent element's textpython:将xml子元素添加到父元素文本的子字符串中
【发布时间】:2020-03-19 03:56:33
【问题描述】:

假设我有一个包含文本的 XML 元素,并且我想为某些单词添加子元素。例如转换:

<s>I don't want to have to entertain every Tom, Dick, and Harry who comes through here.</s>

<s>I don't want to have to entertain every <name nid="n1">Tom</name>, <name nid="n2">Dick</name>, and <name nid="n3">Harry</name> who comes through here.</s>

我有一个需要包装的所有字符串的列表,我可以很容易地在文本字符串中找到它们的位置,但我不知道如何在特定位置添加标签(除了构建通过使用字符串操作来完成整个事情)。使用 ElementTree 或 BeautifulSoup 肯定有更好的方法吗?

【问题讨论】:

    标签: python-3.x xml beautifulsoup elementtree


    【解决方案1】:

    我认为这应该至少能让你大部分时间到达你想去的地方:

        from bs4 import BeautifulSoup as bs
    
        old = """
        <s>I don't want to have to entertain every Tom, Dick, and Harry who comes through here.</s>
        """
        names = ["Tom", "Dick", "Harry"]
    
        soup = bs(old,'lxml')
        orig_tag = soup.s
        old_st_lst = orig_tag.string.split(' ')
        new_st_lst = []
        for ns in old_st_lst:
            t_ns = ns.replace(',','')    
            if t_ns in names:
                place = names.index(t_ns)+1
                new_el = f'<name nid="n{place}">{ns}</name>'
                new_st_lst.append(new_el)        
    
            else:
                new_st_lst.append(ns)    
        final = ' '.join(new_st_lst)    
    
        for item in soup.select('s'):
            item.string = final
    
        print(soup.text)
    

    输出:

    I don't want to have to entertain every <name nid="n1">Tom,</name> <name nid="n2">Dick,</name> and <name nid="n3">Harry</name> who comes through here.
    

    【讨论】:

      最近更新 更多