【问题标题】:XML tag text to string ignoring children tags but including their textXML标签文本到字符串忽略子标签但包括​​他们的文本
【发布时间】:2015-11-10 13:24:38
【问题描述】:

我正在解析如下所示的 XML 数据:

<title-group><article-title>Leucine to proline substitution by SNP at position 197 in Caspase-9 gene expression leads to neuroblastoma: a bioinformatics analysis</article-title></title-group>

有时虽然里面有斜体标签:

<title-group><article-title><italic>Interferon regulatory factor 5</italic> genetic variants are associated with cardiovascular disease in patients with rheumatoid arthritis</article-title></title-group>

以下python代码返回一个正确连接的标题字符串,但前提是斜体标签不在标题的开头(如上面的代码):

    #Get titles
    for node in tree.iter('title-group'):
        for subnode in node.iter('article-title'):
            try:
                title = remove_control_characters(subnode.text)
                if len(title) == 0:
                    for subsubnode in node.iter('italic'):
                        italic = subsubnode.text 
                        tail = remove_control_characters(subsubnode.tail)
                        title += italic + tail  
                        title = str(title)  
                        break                       
            except:
                continue
            for subsubnode in node.iter('italic'):
                italic = subsubnode.text 
                tail = remove_control_characters(subsubnode.tail)
                title += italic + tail  
                title = str(title)  

当斜体标记位于字符串的开头时,不返回任何内容。

有没有更简单的方法(不包括lxml)可以使用?或者,如果您可以建议更改 Python 代码,那也将不胜感激。欢迎提出建议,祝您有愉快的一天。

编辑 [已解决]

#Get titles
    for node in tree.iter('title-group'):
        for subnode in node.iter('article-title'):
            whole = subnode.itertext()
            for parts in whole:
                title += parts
    print(remove_control_characters(title))

【问题讨论】:

    标签: python xml string


    【解决方案1】:

    在你的&lt;article-title&gt;标签上使用itertext()方法,你应该没问题。

    【讨论】:

    • 獠牙! ^-^ 在编辑中查看我的解决方案
    • 你甚至可以通过这样做来缩短它whole = ' '.join(subnode.itertext()
    • 请也看看我的相关@​​987654321@函数问题
    猜你喜欢
    • 2015-01-27
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2020-12-22
    • 1970-01-01
    • 1970-01-01
    • 2017-11-29
    • 1970-01-01
    相关资源
    最近更新 更多