【发布时间】:2022-08-19 00:46:48
【问题描述】:
我有一个来自 Google 的 XML,内容如下:
<?xml version=\"1.0\" encoding=\"UTF-8\" ?>
<rss version=\"2.0\" xmlns:g=\"http://base.google.com/ns/1.0\">
<channel>
<title>E-commerce\'s products.</title>
<description><![CDATA[Clothing and accessories.]]></description>
<link>https://www.ourwebsite.com/</link>
<item>
<title><![CDATA[Product #1 title]]></title>
<g:brand><![CDATA[Product #1 brand]]></g:brand>
<g:mpn><![CDATA[5643785645]]></g:mpn>
<g:gender>Male</g:gender>
<g:age_group>Adult</g:age_group>
<g:size>Unica</g:size>
<g:condition>new</g:condition>
<g:id>fr_30763_06352</g:id>
<g:item_group_id>fr_30763</g:item_group_id>
<link><![CDATA[https://www.ourwebsite.com/product_1_url.htm?mid=62367]]></link>
<description><![CDATA[Product #1 description]]></description>
<g:image_link><![CDATA[https://data.ourwebsite.com/imgprodotto/product-1_big.jpg]]></g:image_link>
<g:sale_price>29.25 EUR</g:sale_price>
<g:price>65.00 EUR</g:price>
<g:shipping_weight>0.5 kg</g:shipping_weight>
<g:featured_product>y</g:featured_product>
<g:product_type><![CDATA[Product #1 category]]></g:product_type>
<g:availability>in stock</g:availability>
<g:availability_date>2022-08-10T00:00-0000</g:availability_date>
<qty>3</qty>
<g:payment_accepted>Visa</g:payment_accepted>
<g:payment_accepted>MasterCard</g:payment_accepted>
<g:payment_accepted>CartaSi</g:payment_accepted>
<g:payment_accepted>Aura</g:payment_accepted>
<g:payment_accepted>PayPal</g:payment_accepted>
</item>
<item>
<title><![CDATA[Product #2 title]]></title>
<g:brand><![CDATA[Product #2 brand]]></g:brand>
<g:mpn><![CDATA[573489547859]]></g:mpn>
<g:gender>Unisex</g:gender>
<g:age_group>Adult</g:age_group>
<g:size>Unica</g:size>
<g:condition>new</g:condition>
<g:id>fr_47362_382936</g:id>
<g:item_group_id>fr_47362</g:item_group_id>
<link><![CDATA[https://www.ourwebsite.com/product_2_url.htm?mid=168192]]></link>
<description><![CDATA[Product #2 description]]></description>
<g:image_link><![CDATA[https://data.ourwebsite.com/imgprodotto/product-2_big.jpg]]></g:image_link>
<g:sale_price>143.91 EUR</g:sale_price>
<g:price>159.90 EUR</g:price>
<g:shipping_weight>8.0 kg</g:shipping_weight>
<g:product_type><![CDATA[Product #2 category]]></g:product_type>
<g:availability>in stock</g:availability>
<g:availability_date>2022-08-10T00:00-0000</g:availability_date>
<qty>1</qty>
<g:payment_accepted>Visa</g:payment_accepted>
<g:payment_accepted>MasterCard</g:payment_accepted>
<g:payment_accepted>CartaSi</g:payment_accepted>
<g:payment_accepted>Aura</g:payment_accepted>
<g:payment_accepted>PayPal</g:payment_accepted>
</item>
...
</channel>
</rss>
我需要生成一个 XML 文件,从 <item> 中的所有标签中清除,<g:mpn>、<link>、<g:sale_price> 和 <qty> 除外。
在上面的例子中,结果应该是
<?xml version=\"1.0\" encoding=\"UTF-8\" ?>
<rss version=\"2.0\" xmlns:g=\"http://base.google.com/ns/1.0\">
<channel>
<title>E-commerce\'s products.</title>
<description><![CDATA[Clothing and accessories.]]></description>
<link>https://www.ourwebsite.com/</link>
<item>
<g:mpn><![CDATA[5643785645]]></g:mpn>
<link><![CDATA[https://www.ourwebsite.com/product_1_url.htm?mid=62367]]></link>
<g:sale_price>29.25 EUR</g:sale_price>
<qty>3</qty>
</item>
<item>
<g:mpn><![CDATA[573489547859]]></g:mpn>
<link><![CDATA[https://www.ourwebsite.com/product_2_url.htm?mid=168192]]></link>
<g:sale_price>143.91 EUR</g:sale_price>
<qty>1</qty>
</item>
...
</channel>
</rss>
我查看过 SimpleXML、DOMDocument、XPath 文档,但找不到排除特定元素的方法。我不想按名称选择我必须删除的节点,因为将来 Google 可以添加一些节点并且它们不会被我的脚本删除。
我还尝试使用 SimpleXML 遍历命名空间元素,如果与我必须保留的节点不匹配,则取消设置它们:
$g = $element->children($namespaces[\'g\']); //$element is the SimpleXMLElement of <item> tag
foreach ($g as $gchild) {
if ($gchild->getName() != \"mpn\") { //for example
unset($gchild);
}
}
但是上面的代码并没有删除除<g:mpn> 之外的所有节点,例如。
PS:考虑 XML 包含命名空间和非命名空间元素的事实
先感谢您。
编辑:我已经设法使用以下代码做到这一点:
$elementsToKeep = array(\"mpn\", \"link\", \"sale_price\", \"qty\");
$domdoc = new DOMDocument();
$domdoc->preserveWhiteSpace = FALSE;
$domdoc->formatOutput = TRUE;
$domdoc->loadXML($myXMLDocument->asXML()); //$myXMLDocument is the SimpleXML document related to the original XML
$xpath = new DOMXPath($domdoc);
foreach ($element->children() as $child) {
$cname = $child->getName();
if (!in_array($cname, $elementsToKeep)) {
foreach($xpath->query(\'/rss/channel/item/\'.$cname) as $node) {
$node->parentNode->removeChild($node);
}
}
}
$g = $element->children($namespaces[\'g\']);
foreach ($g as $gchild) {
$gname = $gchild->getName();
if (!in_array($gname, $elementsToKeep)) {
foreach($xpath->query(\'/rss/channel/item/g:\'.$gname) as $node) {
$node->parentNode->removeChild($node);
}
}
}
为了使用 DOMDocument 的 removeChild 函数,我使用了 DOMDocument 和 DOMXPath 以及在无命名空间标签和命名空间标签上的两个循环。
真的没有更清洁的解决方案吗?再次感谢
-
对于 XSLT 来说,这是一项微不足道的任务。
标签: php xml xpath simplexml domdocument