【问题标题】:PHP : Ganon dom parser filter the anchor tagsPHP:Ganon dom 解析器过滤锚标签
【发布时间】:2013-08-08 05:38:29
【问题描述】:

我正在使用 PHP Dom Parser 库,我有一个复杂的 HTML 结构要解析:

<table width="640" style="color: #333333;">
<tbody><tr>
<td valign="top">
<font face="Arial,Helevetica,sans-serif">
<a href="http://forums.timezone.com/index.php?t=tree&amp;goto=6577581&amp;rid=0">20mm Omega SMP Bond Bracelet Ref. 1503-825- PRICE DROP</a><br>
<font size="-1" color="#999999">Sales Corner - <a href="http://forums.timezone.com/index.php?t=usrinfo&amp;id=462&amp;rid=0">The Bigwatch Guy</a></font><font size="-1" color="#999999"> - Aug 7, 2013</font><br>
<font size="-1">20mm OMEGA SEAMASTER PROFESSIONAL "BOND" BRACELET REF. 1503-825. All s/s genuine Bond bracelet in excellent condition. The bracelet is 6.6 inches long...</font>
<br>
<br>
</font></td>
</tr>
<tr>
<td valign="top">
<font face="Arial,Helevetica,sans-serif">
<a href="http://forums.timezone.com/index.php?t=tree&amp;goto=6577577&amp;rid=0">Longines Lindbergh Hour Angle Chronograph- PRICE DROP</a><br>
<font size="-1" color="#999999">Sales Corner - <a href="http://forums.timezone.com/index.php?t=usrinfo&amp;id=462&amp;rid=0">The Bigwatch Guy</a></font><font size="-1" color="#999999"> - Aug 7, 2013</font><br>
<font size="-1">42mm (not counting the crown) LONGINES LINDBERGH HOUR ANGLE AUTOMATIC CHRONOGRAPH W/ COMPLETE BOXSET AND PAPERS - NEARMINT PLUS CONDITION. The strap h...</font>
<br>
<br>
</font></td>
</tr>
</table>

我正在尝试获取 href 属性包含 goto 字符串的所有锚标记 ,我试过下面的代码:

<?php 
include("ganon.php");
$html = file_get_dom('http://forums.timezone.com/search/?q=Public+Forum&f=4&s=0');
$c=1;
if( count($html("table[width='640']"))>0 ){
    foreach($html("a[href=*goto]") as $elm){
            echo $c.')'.$elm->href.'<br/>';
    $c++;
    }
}
?>

上面的代码抛出这个通知:Notice: Expected identifier at 7! in D:\xampp\htdocs\govberg\ganon.php on line 2196,没有其他输出。

【问题讨论】:

    标签: php dom html-parsing


    【解决方案1】:

    从选择器documentation,你可以看到:

    E[foo*="bar"] : 一个 E 元素,其 "foo" 属性值包含子字符串 "bar"

    你用错了。

    更改以下行:

    foreach($html("a[href=*goto]") as $elm){
    

    到:

    foreach($html('a[href*="goto"]') as $elm)
    

    输出:Pastebin

    希望这会有所帮助!

    【讨论】:

    • 哎呀.. 我必须检查选择器,无论如何你让我很开心:-) 谢谢