【问题标题】:Parse XML with xmlns [duplicate]使用 xmlns 解析 XML [重复]
【发布时间】:2018-12-03 23:06:20
【问题描述】:

我在 python3 中解析 XML 时遇到了很多麻烦。

例如,我只想获取作者姓名。即使经过数小时的搜索也无法弄清楚,您能帮帮我吗?

from urllib.request import urlopen
import xml.etree.ElementTree as ET

filing_url = "https://www.sec.gov/cgi-bin/browse-edgar?action=getcompany&CIK=0001326801&type=&dateb=&owner=include&start=0&count=40&output=atom"

        tree = ET.parse('countries.xml')
        root = tree.getroot()


        for child in root.findall('author'):
            print(child.tag, child.attrib)

xml 内容

    <?xml version="1.0" encoding="ISO-8859-1" ?>
    <feed xmlns="http://www.w3.org/2005/Atom">
        <author>
            <email>webmaster@sec.gov</email>
            <name>Webmaster</name>
        </author>
        <company-info><state-location>CA</state-location>
            <state-location-href>http://www.sec.gov/cgi-bin/browse-edgar?action=getcompany&amp;State=CA&amp;owner=include&amp;count=40</state-location-href>
            <state-of-incorporation>DE</state-of-incorporation>
        </company-info>
<entry>
        <category label="form type" scheme="http://www.sec.gov/" term="4" />
        <content type="text/xml">
            <accession-nunber>0001127602-18-034767</accession-nunber>
            <filing-date>2018-11-29</filing-date>
            <filing-href>http://www.sec.gov/Archives/edgar/data/1326801/000112760218034767/0001127602-18-034767-index.htm</filing-href>
            <filing-type>4</filing-type>
            <form-name>Statement of changes in beneficial ownership of securities</form-name>
            <size>4 KB</size>
        </content>
        <id>urn:tag:sec.gov,2008:accession-number=0001127602-18-034767</id>
        <link href="http://www.sec.gov/Archives/edgar/data/1326801/000112760218034767/0001127602-18-034767-index.htm" rel="alternate" type="text/html" />
        <summary type="html"> &lt;b&gt;Filed:&lt;/b&gt; 2018-11-29 &lt;b&gt;AccNo:&lt;/b&gt; 0001127602-18-034767 &lt;b&gt;Size:&lt;/b&gt; 4 KB</summary>
        <title>4  - Statement of changes in beneficial ownership of securities</title>
        <updated>2018-11-29T18:46:54-05:00</updated>
    </entry>
    <entry>
        <category label="form type" scheme="http://www.sec.gov/" term="4" />
        <content type="text/xml">
            <accession-nunber>0001127602-18-034766</accession-nunber>
            <filing-date>2018-11-29</filing-date>
            <filing-href>http://www.sec.gov/Archives/edgar/data/1326801/000112760218034766/0001127602-18-034766-index.htm</filing-href>
            <filing-type>4</filing-type>
            <form-name>Statement of changes in beneficial ownership of securities</form-name>
            <size>19 KB</size>
        </content>
        <id>urn:tag:sec.gov,2008:accession-number=0001127602-18-034766</id>
        <link href="http://www.sec.gov/Archives/edgar/data/1326801/000112760218034766/0001127602-18-034766-index.htm" rel="alternate" type="text/html" />
        <summary type="html"> &lt;b&gt;Filed:&lt;/b&gt; 2018-11-29 &lt;b&gt;AccNo:&lt;/b&gt; 0001127602-18-034766 &lt;b&gt;Size:&lt;/b&gt; 19 KB</summary>
        <title>4  - Statement of changes in beneficial ownership of securities</title>
        <updated>2018-11-29T18:44:39-05:00</updated>
    </entry>
</feed>

【问题讨论】:

    标签: python xml python-3.x xml-parsing


    【解决方案1】:

    我不是 100% 确定您的问题是什么。但是,如果你能推荐使用 BeautifulSoup

    例如:

    from bs4 import BeautifulSoup
    
    infile = open("myxml.xml","r")
    
    contents = infile.read()
    
    soup = BeautifulSoup(contents,'html.parser')
    
    authors = soup.find_all('author')
    
    
    for author in authors:
        print (author)
    
    #Output--
    #<author>
    #<email>webmaster@sec.gov</email>
    #<name>Webmaster</name>
    #</author>
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 2013-07-10
      • 2012-05-05
      • 1970-01-01
      • 2015-09-14
      • 1970-01-01
      • 2021-02-28
      • 1970-01-01
      相关资源
      最近更新 更多