【问题标题】:How do I validate against multiple xsd schemas using lxml?如何使用 lxml 验证多个 xsd 模式?
【发布时间】:2016-12-19 00:26:18
【问题描述】:

我正在编写一个单元测试来验证我通过获取其 xsd 架构并使用 python 的 lxml 库验证生成的站点地图 xml:

这是我的根元素的一些元数据:

xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
xmlns:image="http://www.google.com/schemas/sitemap-image/1.1"
xsi:schemaLocation="http://www.sitemaps.org/schemas/sitemap/0.9 
http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd 
http://www.google.com/schemas/sitemap-image/1.1 
http://www.google.com/schemas/sitemap-image/1.1/sitemap-image.xsd"

还有这个测试代码:

_xsd_validators = {}
def get_xsd_validator(url):
    if url not in _xsd_validators:
        _xsd_validators[url] = etree.XMLSchema(etree.parse(StringIO(requests.get(url).content)))
    return _xsd_validators[url]


# this util function is later on in a TestCase
def validate_xml(self, content):
    content.seek(0)
    doc = etree.parse(content)
    schema_loc = doc.getroot().attrib.get('{http://www.w3.org/2001/XMLSchema-instance}schemaLocation').split(' ')
    # lxml doesn't like multiple namespaces
    for i, loc in enumerate(schema_loc):
        if i % 2 == 1:
            get_xsd_validator(schema_loc[i]).assertValid(doc)
    return doc

验证失败的示例 XML:

<?xml version="1.0" encoding="UTF-8"?>
<urlset
  xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
  xmlns:image="http://www.google.com/schemas/sitemap-image/1.1"
  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xsi:schemaLocation="
    http://www.sitemaps.org/schemas/sitemap/0.9
    http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd
    http://www.google.com/schemas/sitemap-image/1.1
    http://www.google.com/schemas/sitemap-image/1.1/sitemap-image.xsd"
>
  <url>
    <loc>https://www.example.com/press</loc>
    <lastmod>2016-08-11</lastmod>

    <changefreq>weekly</changefreq>
  </url>

  <url>
    <loc>https://www.example.com/about-faq</loc>
    <lastmod>2016-08-11</lastmod>

    <changefreq>weekly</changefreq>
  </url>


</urlset>

当我刚刚拥有一个常规站点地图时,一切都很好,但是当我添加图像站点地图标记时,assertValid 开始失败:

E   DocumentInvalid: Element '{http://www.google.com/schemas/sitemap-image/1.1}image': No matching global element declaration available, but demanded by the strict wildcard., line 12

或者:

E   DocumentInvalid: Element '{http://www.sitemaps.org/schemas/sitemap/0.9}urlset': No matching global declaration available for the validation root., line 6

【问题讨论】:

    标签: python xml xsd lxml


    【解决方案1】:

    您可以尝试定义一个包装器模式 wrapper-schema.xsd,它会导入所有需要的模式,并将此模式与 lxml 一起使用,而不是其他每个模式。

    <?xml version="1.0" encoding="UTF-8"?>
    <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
      <xs:import
        namespace="http://www.sitemaps.org/schemas/sitemap/0.9"
        schemaLocation="http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd"/>
      <xs:import
        namespace="http://www.google.com/schemas/sitemap-image/1.1"
        schemaLocation="http://www.google.com/schemas/sitemap-image/1.1/sitemap-image.xsd"/>
    </xs:schema>
    

    我没有 python,但这在 oXygen 中成功验证:

    <?xml version="1.0" encoding="UTF-8"?>
    <urlset  xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
        xmlns:image="http://www.google.com/schemas/sitemap-image/1.1"
        xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
        xsi:noNamespaceSchemaLocation="wrapper-schema.xsd"
        >
        <image:image>
            <image:loc>http://www.example.com/image</image:loc>
        </image:image>
        <url>
            <loc>https://www.example.com/press</loc>
            <lastmod>2016-08-11</lastmod>
            <changefreq>weekly</changefreq>
        </url>
        <url>
            <loc>https://www.example.com/about-faq</loc>
            <lastmod>2016-08-11</lastmod>
            <changefreq>weekly</changefreq>
        </url>
    </urlset>
    

    【讨论】:

    猜你喜欢
    • 2011-03-10
    • 1970-01-01
    • 2011-02-07
    • 2014-07-13
    • 2018-02-09
    • 1970-01-01
    • 2011-10-23
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多