【发布时间】:2014-01-30 23:34:03
【问题描述】:
我需要过滤庞大而冗余的 xml 文件。 简单的事情是消除所有没有属性和没有内容的节点:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="@*|node()">
<xsl:if test=". != '' or ./@* != ''">
<xsl:copy>
<xsl:apply-templates select="@*|node()"/>
</xsl:copy>
</xsl:if>
</xsl:template>
</xsl:stylesheet>
但我还需要过滤掉包含的节点
<type>0</type>
仅包含的节点
<whatever id="-1 />
以及仅包含空属性的节点,例如:
<dateacquired year="" month="" day="" long="" unformatted=""/>
我的(机器生成的)输入文件的摘录是:
<record table="book" id="1">
<bookdata>
<bookid unformatted="1">1</bookid>
<marked bool="False">No</marked>
<lastmodified year="2013" month="09" day="25" long="Wednesday, September 25, 2013" unformatted="20130925">09/25/2013</lastmodified>
<title>Intervista Col Vampiro</title>
<fulltitle>Ciclo Dei Vampiri: Intervista Col Vampiro</fulltitle>
<fulltitle2>Intervista Col Vampiro (Ciclo Dei Vampiri)</fulltitle2>
<referenceno>BB00001</referenceno>
<publishdate year="1993" month="" day="" long="1993" unformatted="1993">1993</publish date>
<copyrightdate year="" month="" day="" long="" unformatted=""/>
<type id="-1"/>
<authors sort="Rice, Anne">
<author id="1">
<name>Anne Rice</name>
<sortby>Rice, Anne</sortby>
<roles/>
</author>
</authors>
<credits/>
<image1>
<filename>Book_1_3.jpg</filename>
<type>2</type>
<notes/>
</image1>
<image2>
<filename/>
<type>0</type>
<notes/>
</image2>
<image3>
<filename/>
<type>0</type>
<notes/>
</image3>
<image4>
<filename/>
<type>0</type>
<notes/>
</image4>
<image5>
<filename/>
<type>0</type>
<notes/>
</image5>
<image6>
<filename/>
<type>0</type>
<notes/>
</image6>
<image7>
<filename/>
<type>0</type>
<notes/>
</image7>
<image8>
<filename/>
<type>0</type>
<notes/>
</image8>
<image9>
<filename/>
<type>0</type>
<notes/>
</image9>
<subtitle/>
<titlesort>Intervista Col Vampiro</titlesort>
<publisher id="1">Salani</publisher>
<publicationplace id="-1"/>
<isbn/>
<lccn/>
<lccallnum/>
<dewey>823.9</dewey>
<country id="-1"/>
<pages unformatted="283">283</pages>
<numberofsections unformatted="0">0</numberofsections>
<printedby id="-1"/>
<binding id="-1"/>
<edition id="1">Ebook</edition>
<printing id="-1"/>
<language id="-1"/>
<series id="1">Ciclo Dei Vampiri</series>
<releaseno unformatted="0">0</releaseno>
<originaltitle>Interview With The Vampire</originaltitle>
<originalsubtitle/>
<originalpublisher id="-1"/>
<originalcountry id="-1"/>
<originallanguage id="-1"/>
<originalcopyright year="1976" month="" day="" long="1976" unformatted="1976">1976</originalcopyright>
<price integer="8" fraction="0" unformatted="8.0">8.00</price>
<value integer="0" fraction="0" unformatted="0.0">0.00</value>
<sellingprice integer="0" fraction="0" unformatted="0.0">0.00</sellingprice>
<changeinvalue>0.00</changeinvalue>
<changeinvaluepr>0.00</changeinvaluepr>
<condition id="-1"/>
<appraiser id="-1"/>
<insurance id="-1"/>
<registered year="2005" month="09" day="10" long="Saturday, September 10, 2005" unformatted="20050910">09/10/2005</registered>
<status id="-1"/>
<dateacquired year="" month="" day="" long="" unformatted=""/>
<acquiredfrom id="-1"/>
<personalrating id="-1"/>
<category id="1">Horror-Gotico</category>
<subcategory id="-1"/>
<owner id="-1"/>
<location id="-1"/>
<keywords>
<keyword id="1">Vampiro</keyword>
<keyword id="2">Vampiri</keyword>
</keywords>
<newbook bool="False">No</newbook>
<onloan bool="False">No</onloan>
<overdue bool="False">No</overdue>
<borrower id="-1"/>
<borrowercategory id="-1"/>
<dateborrowed year="" month="" day="" long="" unformatted=""/>
<datedue year="" month="" day="" long="" unformatted=""/>
<reserved bool="False">No</reserved>
<reservedto id="-1"/>
<reserveddate year="" month="" day="" long="" unformatted=""/>
<awards/>
<awardyear/>
<awarddetails/>
<nominations/>
<nominationyear/>
<nominationdetails/>
<custom01/>
<custom02/>
<custom03>http://www.ddunlimited.net/viewtopic.php?f=1079&t=3749847</custom03>
<custom04/>
<custom05 id="-1"/>
<custom06 id="-1"/>
<custom07 id="-1"/>
<custom08 id="-1"/>
<custom09 year="" month="" day="" long="" unformatted=""/>
<custom10 integer="0" fraction="0" unformatted="0.0">0.00</custom10>
<custom11 bool="True">Yes</custom11>
<custom12 bool="False">No</custom12>
<custom13 bool="False">No</custom13>
<custom14 bool="True">Yes</custom14>
<custom15 bool="False">No</custom15>
<custom16 bool="False">No</custom16>
<custom17 bool="False">No</custom17>
<custom18 bool="False">No</custom18>
<notes>ed2k://|file|eBook.ITA.001.Anne.Rice.Intervista.Col.Vampiro.(doc.lit.pdf.rtf).[Hyps].rar|1998285|81D4C283C03E5787170A33C335577533|/</notes>
<synopsis>A San Francisco alle soglie del 2000 il giornalista Mallory viene avvicinato da Louis De Point Du Lac, vampiro dal 1791, quando era un proprietario terriero presso New Orleans. Ridotto alla disperazione per la perdita della moglie e della figlioletta vieneiniziato alla sua tenebrosa e ferina esistenza da Lestat, collega di origini parigine, che cerca invano di far superare al discepolo l'innata repulsione per l'omicidio. Invano Louis si ciba di sangue di ratti e galline, e fà fuggire i servi incendiando la casa. Ormai Lestat lo domina e lo coinvolge in efferate uccisioni di innocenti. Una bimba orfana, Claudia, viene "adottata" dai due e si rivela feroce quant'altri mai.</synopsis>
<reviews/>
<weblinks/>
<weblinktype id="1"/>
<filelinks/>
<filelinktype id="1"/>
<barcode/>
<originalseries id="-1"/>
<originalreleaseno unformatted="0">0</originalreleaseno>
<readhistory/>
<lastread year="" month="" day="" long="" unformatted=""/>
<readcount unformatted="0">0</readcount>
<dustjacketcondition id="-1"/>
<dimensions_width integer="0" fraction="0" unformatted="0.0">0.00</dimensions_width>
<dimensions_height integer="0" fraction="0" unformatted="0.0">0.00</dimensions_height>
<dimensions_depth integer="0" fraction="0" unformatted="0.0">0.00</dimensions_depth>
<coverprice integer="0" fraction="0" unformatted="0.0">0.00</coverprice>
<coverprice_currency id="-1"/>
<booklinks/>
</bookdata>
<contentsdata items="0"/>
</record>
期望的输出是:
<record table="book" id="1">
<bookdata>
<bookid unformatted="1">1</bookid>
<marked bool="False">No</marked>
<lastmodified year="2013" month="09" day="25" long="Wednesday, September 25, 2013" unformatted="20130925">09/25/2013</lastmodified>
<title>Intervista Col Vampiro</title>
<fulltitle>Ciclo Dei Vampiri: Intervista Col Vampiro</fulltitle>
<fulltitle2>Intervista Col Vampiro (Ciclo Dei Vampiri)</fulltitle2>
<referenceno>BB00001</referenceno>
<publishdate year="1993" month="" day="" long="1993" unformatted="1993">1993</publish date>
<authors sort="Rice, Anne">
<author id="1">
<name>Anne Rice</name>
<sortby>Rice, Anne</sortby>
</author>
</authors>
<image1>
<filename>Book_1_3.jpg</filename>
<type>2</type>
</image1>
<titlesort>Intervista Col Vampiro</titlesort>
<publisher id="1">Salani</publisher>
<dewey>823.9</dewey>
<pages unformatted="283">283</pages>
<numberofsections unformatted="0">0</numberofsections>
<edition id="1">Ebook</edition>
<series id="1">Ciclo Dei Vampiri</series>
<releaseno unformatted="0">0</releaseno>
<originaltitle>Interview With The Vampire</originaltitle>
<originalcopyright year="1976" month="" day="" long="1976" unformatted="1976">1976</originalcopyright>
<price integer="8" fraction="0" unformatted="8.0">8.00</price>
<value integer="0" fraction="0" unformatted="0.0">0.00</value>
<sellingprice integer="0" fraction="0" unformatted="0.0">0.00</sellingprice>
<changeinvalue>0.00</changeinvalue>
<changeinvaluepr>0.00</changeinvaluepr>
<registered year="2005" month="09" day="10" long="Saturday, September 10, 2005" unformatted="20050910">09/10/2005</registered>
<category id="1">Horror-Gotico</category>
<keywords>
<keyword id="1">Vampiro</keyword>
<keyword id="2">Vampiri</keyword>
</keywords>
<newbook bool="False">No</newbook>
<onloan bool="False">No</onloan>
<overdue bool="False">No</overdue>
<reserved bool="False">No</reserved>
<custom03>http://www.ddunlimited.net/viewtopic.php?f=1079&t=3749847</custom03>
<custom10 integer="0" fraction="0" unformatted="0.0">0.00</custom10>
<custom11 bool="True">Yes</custom11>
<custom12 bool="False">No</custom12>
<custom13 bool="False">No</custom13>
<custom14 bool="True">Yes</custom14>
<custom15 bool="False">No</custom15>
<custom16 bool="False">No</custom16>
<custom17 bool="False">No</custom17>
<custom18 bool="False">No</custom18>
<notes>ed2k://|file|eBook.ITA.001.Anne.Rice.Intervista.Col.Vampiro.(doc.lit.pdf.rtf).[Hyps].rar|1998285|81D4C283C03E5787170A33C335577533|/</notes>
<synopsis>A San Francisco alle soglie del 2000 il giornalista Mallory viene avvicinato da Louis De Point Du Lac, vampiro dal 1791, quando era un proprietario terriero presso New Orleans. Ridotto alla disperazione per la perdita della moglie e della figlioletta vieneiniziato alla sua tenebrosa e ferina esistenza da Lestat, collega di origini parigine, che cerca invano di far superare al discepolo l'innata repulsione per l'omicidio. Invano Louis si ciba di sangue di ratti e galline, e fà fuggire i servi incendiando la casa. Ormai Lestat lo domina e lo coinvolge in efferate uccisioni di innocenti. Una bimba orfana, Claudia, viene "adottata" dai due e si rivela feroce quant'altri mai.</synopsis>
<weblinktype id="1"/>
<filelinktype id="1"/>
<originalreleaseno unformatted="0">0</originalreleaseno>
<readcount unformatted="0">0</readcount>
<dimensions_width integer="0" fraction="0" unformatted="0.0">0.00</dimensions_width>
<dimensions_height integer="0" fraction="0" unformatted="0.0">0.00</dimensions_height>
<dimensions_depth integer="0" fraction="0" unformatted="0.0">0.00</dimensions_depth>
<coverprice integer="0" fraction="0" unformatted="0.0">0.00</coverprice>
</bookdata>
<contentsdata items="0"/>
</record>
问题是我并没有真正了解转换,当我尝试阅读它们时,我没有找到一个易于理解的教程。欢迎任何指针!
作为额外的奖励,我还想过滤掉特定的“空”项目,如上述尺寸_*。
钛酸
【问题讨论】:
-
为什么输出中保留了一些空节点(例如
<credits/>),而没有保留其他节点? -
@LegoStormtroopr 因为我手动编辑了条目,忘记删除它;)这应该与其他人一起使用。我编辑了上面的内容,希望我没有忘记任何东西。我应该删除:空标签(没有文本和属性);空标签(无文本,一些属性为空);默认标签(带有
id="-1"属性的标签);特定标签(<type>0</type>);可能还有一些其他特定标签(例如:<dimensions_width integer="0" fraction="0" unformatted="0.0">0.00</dimensions_width>)