【问题标题】:removing characters from php output in xml从xml中的php输出中删除字符
【发布时间】:2012-02-01 16:23:04
【问题描述】:

我编写了一个从 Magento Commerce 数据库中提取的 xml/php 文档,以创建包含其中所有项目的 XML 文档,以便 Google 的购物可以导入这些项目。谷歌的系统正在挂断一个项目,我相信这是由于特殊字符。我想从输出中删除这些字符。请注意,输出中有几个注册商标、一些引号和逗号。我怀疑引号或逗号是问题,我认为这可能是商标。

这是输出

<title>The FoamZall - Spray Foam Insulation Trimming Foam Saw - w/ Open Cell Blade</title>
<description>The FOAMZALL includes the toughest Milwaukee® brand heavy-duty orbital Sawzall® around, which has a custom coupling to secure a 36" long serrated blade intended for trimming 1/2 lb and 2 LB foam.  The 13 Amp, 120 Vac saw has a 1 1/4" stroke and can provide up to 3,000 strokes per minute.  Carry case is included.   </description>
<g:google_product_category>Business &amp; Industrial &gt; Construction</g:google_product_category>
<g:product_type>Spray Foam Parts &amp; Supplies &gt; Fusion AP Parts</g:product_type>
<link>http://sprayfoamsys.com/store/the-foamzall-spray-foam-insulation-trimming-saw-open-cell-blade.html</link>
<g:image_link>http://sprayfoamsys.com/store/media/catalog/product/f/o/foamzall.jpg</g:image_link>
<g:condition>new</g:condition>
<g:availability>in stock</g:availability>
<g:price>425.0000</g:price>
<g:brand></g:brand>
<g:mpn></g:mpn>
</item>
<item>

我的脚本是:

<?php echo '<?xml version="1.0" ?>'; ?>
<rss version="2.0" xmlns:g="http://base.google.com/ns/1.0" xmlns:c="http://www.base.google.com/cns/1.0">
<channel>
<title>Spray Foam Systems</title>
<link>http://www.sprayfoamsys.com/store/</link>
<description>Spray Foam Rigs, Spray Foam Equipment, Sprayfoam Parts and Supplies.</description>
<?php
$con = mysql_connect(REMOVED) or die(mysql_error());
    if (!$con)
        {
            die('Could not connect: ' . mysql_error());
        }
    mysql_select_db("sprayfoa_store", $con);

    $query = mysql_query("SELECT * FROM `catalog_product_flat_1` WHERE `visibility` = 4 ORDER BY entity_id asc")
    or die(mysql_error());
?>
<?php
    while($row = mysql_fetch_array($query))
        {
?>
<item>
<g:id><?php echo $row['entity_id']; ?></g:id>
<title><?php echo $row['name']; ?></title>
<description><?php echo (str_replace(array("\r\n", "\n"), ' ', $row['short_description'])); ?></description>
<g:google_product_category>Business &amp; Industrial &gt; Construction</g:google_product_category>
<g:product_type>Spray Foam Parts &amp; Supplies &gt; Fusion AP Parts</g:product_type>
<link>http://sprayfoamsys.com/store/<?php echo $row['url_path']; ?></link>
<g:image_link>http://sprayfoamsys.com/store/media/catalog/product<?php echo $row['small_image']; ?></g:image_link>
<g:condition>new</g:condition>
<g:availability>in stock</g:availability>
<g:price><?php echo $row['price']; ?></g:price>
<g:brand><?php $entity_id = $row['entity_id']; $query2 = mysql_query("SELECT * FROM `catalog_product_entity_varchar` WHERE entity_id = '$entity_id' AND attribute_id = '127'") or die(mysql_error()); while($row2 = mysql_fetch_array($query2)) { echo $row2['value']; } ?></g:brand>
<g:mpn><?php echo $row['sku']; ?></g:mpn>
</item>
<?php
}
mysql_close($con);
?>
</channel>
</rss>

【问题讨论】:

  • 嗯,没遇到过。您是否在 XML 中指定了字符集?你发送什么字符集?如果不是 utf-8,你能试试那个吗?
  • 我得到的错误是 XML 格式错误 - 第 205 行第 559 列第 205 行是 &lt;description&gt; 标记
  • XML 是哪种编码方式?您从数据库中获得的变量是哪种编码?失败的 XML 输出是什么?哪种编码对 google 服务有效?为什么不使用 XML 编写器?
  • 把prolog改成&lt;?xml version="1.0" encoding="utf-8"?&gt;会起作用吗?
  • @hakre - XML 现在是 utf-8,我不确定来自数据库的编码,失败的 xml 输出行现在是第 121 行第 330 行,这里是行&lt;description&gt;The FOAMZALL includes the toughest Milwaukee� brand heavy-duty orbital Sawzall� around, which has a custom coupling to secure a 36" long serrated blade intended for trimming 1/2 lb and 2 LB foam. The 13 Amp, 120 Vac saw has a 1 1/4" stroke and can provide up to 3,000 strokes per minute. Carry case is included. &lt;/description&gt;。我不确定 Google 购物可接受的编码。

标签: php xml magento google-shopping


【解决方案1】:

我很确定 0-255 范围之外的任何字符都应编码为 &amp;#___;

【讨论】:

  • 这对于 XML 来说是错误的,只有极少数需要实体,而有些字符根本不存在(如 \x00)。如果文档编码正确,大多数字符都可以工作,例如UTF-8 是 XML 的默认值。
  • 可能是这样,但API可能有限制。例如,它可能只接受iso-8859-1 中的文档。
  • 那么无论如何都不会有 0-255 之外的字符,但是对于那个编码来说也是一样的:你的答案中的一些是有效的,而其余的则不需要实体。
猜你喜欢
  • 2014-10-04
  • 1970-01-01
  • 2012-09-21
  • 1970-01-01
  • 1970-01-01
  • 2018-08-31
  • 2021-11-02
  • 1970-01-01
  • 1970-01-01
相关资源
最近更新 更多