【问题标题】:strip off CData from xml using xslt使用 xslt 从 xml 中剥离 CData
【发布时间】:2016-11-22 10:19:15
【问题描述】:

我正在使用 xslt 从以下 xslt 中提取数据。无论如何要剥离CData。目前它在提取时也包含 CData。

<Product>
<ExternalId><![CData[55037]]></ExternalId>
<Name><![CData[Reindeer Booties]]></Name>
<Description><![CData[Everybody say, "Aww!" Prepare for maximum cuteness when these plush reindeer booties are unwrapped from their special box. Faux fur provides plenty of warmth for tiny toes and softness for delicate skin. A pompom nose with 3D ears and antlers are enough to bring out the festive spirit in anyone.]]></Description>
<Brand>XYZ</Brand>
<CategoryExternalId>1_15_1</CategoryExternalId>
<ProductPageUrl><![CData[http://www.xyz.co.uk/baby-accessories/SE037/baby-reindeer-booties]]></ProductPageUrl>
<ImageUrl><![CData[http://www.xyzimages.com/images/product/16S_550.jpg]]></ImageUrl>
<SwatchImageUrl><![CData[]]></SwatchImageUrl>
<Price>84.8000</Price>
<Wasprice>154.9500</Wasprice>
<ManufacturerPartNumber></ManufacturerPartNumber>
<EAN></EAN>
<Colours><![CData[blue-pink]]</Colours>
</Product>

我期待以下输出

<Product>
<ExternalId>55037</ExternalId>
<Name>Reindeer Booties></Name>
<Description>Everybody say, "Aww!" Prepare for maximum cuteness when these plush reindeer booties are unwrapped from their special box. Faux fur provides plenty of warmth for tiny toes and softness for delicate skin. A pompom nose with 3D ears and antlers are enough to bring out the festive spirit in anyone.</Description>
<Brand>XYZ</Brand>
<CategoryExternalId>1_15_1</CategoryExternalId>
<ProductPageUrl>http://www.xyz.co.uk/baby-accessories/SE037/baby-reindeer-booties</ProductPageUrl>
<ImageUrl>http://www.xyzimages.com/images/product/16S_550.jpg</ImageUrl>
<SwatchImageUrl></SwatchImageUrl>
<Price>84.8000</Price>
<Wasprice>154.9500</Wasprice>
<ManufacturerPartNumber></ManufacturerPartNumber>
<EAN></EAN>
<Colours>blue-pink</Colours>
</Product>

【问题讨论】:

  • 您能展示一下您的 xslt(相关部分)吗?

标签: c# xml xslt


【解决方案1】:

您向我们展示的输入不是格式正确的 XML,XSLT 无法处理:

  • 首先,CDATA sections 必须以 &lt;![CDATA[ 开头,而不是 &lt;![CData[ 如你所愿(XML 区分大小写)。

  • 接下来,CDATA 部分必须以 ]]&gt; 结尾。这个结局在 您输入的第 14 行(您只有 ]]

一旦您修复了这些缺陷,并拥有格式良好的 XML 输入,例如:

XML

<Product>
    <ExternalId><![CDATA[55037]]></ExternalId>
    <Name><![CDATA[Reindeer Booties]]></Name>
    <Description><![CDATA[Everybody say, "Aww!" Prepare for maximum cuteness when these plush reindeer booties are unwrapped from their special box. Faux fur provides plenty of warmth for tiny toes and softness for delicate skin. A pompom nose with 3D ears and antlers are enough to bring out the festive spirit in anyone.]]></Description>
    <Brand>XYZ</Brand>
    <CategoryExternalId>1_15_1</CategoryExternalId>
    <ProductPageUrl><![CDATA[http://www.xyz.co.uk/baby-accessories/SE037/baby-reindeer-booties]]></ProductPageUrl>
    <ImageUrl><![CDATA[http://www.xyzimages.com/images/product/16S_550.jpg]]></ImageUrl>
    <SwatchImageUrl><![CDATA[]]></SwatchImageUrl>
    <Price>84.8000</Price>
    <Wasprice>154.9500</Wasprice>
    <ManufacturerPartNumber></ManufacturerPartNumber>
    <EAN></EAN>
    <Colours><![CDATA[blue-pink]]></Colours>
</Product>

然后您可以应用一个简单的、仅身份转换的样式表:

XSLT 1.0

<xsl:stylesheet version="1.0" 
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
<xsl:strip-space elements="*"/>

<!-- identity transform -->
<xsl:template match="@*|node()">
    <xsl:copy>
        <xsl:apply-templates select="@*|node()"/>
    </xsl:copy>
</xsl:template>

</xsl:stylesheet>

返回:

结果

<?xml version="1.0" encoding="UTF-8"?>
<Product>
   <ExternalId>550&lt;37</ExternalId>
   <Name>Reindeer Booties</Name>
   <Description>Everybody say, "Aww!" Prepare for maximum cuteness when these plush reindeer booties are unwrapped from their special box. Faux fur provides plenty of warmth for tiny toes and softness for delicate skin. A pompom nose with 3D ears and antlers are enough to bring out the festive spirit in anyone.</Description>
   <Brand>XYZ</Brand>
   <CategoryExternalId>1_15_1</CategoryExternalId>
   <ProductPageUrl>http://www.xyz.co.uk/baby-accessories/SE037/baby-reindeer-booties</ProductPageUrl>
   <ImageUrl>http://www.xyzimages.com/images/product/16S_550.jpg</ImageUrl>
   <SwatchImageUrl/>
   <Price>84.8000</Price>
   <Wasprice>154.9500</Wasprice>
   <ManufacturerPartNumber/>
   <EAN/>
   <Colours>blue-pink</Colours>
</Product>

【讨论】:

  • 感谢您的帮助,但它仍然没有剥离 cdata。请问还有什么建议吗?
  • 其实刚刚意识到我的c#应用程序像这样创建CDATA <![CDATA[52011]]>不像那样 。请问有什么解决办法吗?
  • @Ibex 请编辑您的问题并展示一个小而完整的真实输入示例 - 请参阅:minimal reproducible example
  • 感谢迈克尔的帮助。这是 xml 序列化的问题,下面的代码修复了它。 [XmlElement("GroupDescr")] public XmlCDataSection GroupDescr { get { return new System.Xml.XmlDocument().CreateCDataSection(GroupDescrInternal); } 设置 { GroupDescrInternal = value.Value; } }
【解决方案2】:

您真正的问题是您的 xml 已损坏,应该修复错误的根源,而不是修补结果。 CData 不应位于尖括号标记中。它应该以“!”开头并以“]”结尾。以下正则表达式将修复错误。

using System.Xml;
using System.Xml.Linq;
using System.IO;
using System.Text.RegularExpressions;

namespace ConsoleApplication28
{
    class Program
    {
        const string FILENAME = @"c:\temp\test.xml";
        static void Main(string[] args)
        {
            string xml = File.ReadAllText(FILENAME);
            string pattern = @"(?'open'<)(?'cdata'!\[CData[^\>]+)(?'close'>)";
            string fixedXml = Regex.Replace(xml, pattern, "${cdata}");
            XDocument doc = XDocument.Parse(fixedXml);
        }
    }
}

【讨论】:

【解决方案3】:

由于您使用的是 C#,那么您可以完全不使用 XSLT,只使用 LINQ to XML。

var doc = XDocument.Load("test.xml");

foreach (var n in doc.DescendantNodes().OfType<XCData>().ToList())
{
    n.ReplaceWith(n.Value);
}

doc.Save("test2.xml");

当然,您的输入 XML 应该格式正确,正如 michael.hor257k 所指出的那样。

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 2016-06-05
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2013-03-19
    • 1970-01-01
    • 1970-01-01
    • 2010-12-02
    相关资源
    最近更新 更多