【问题标题】:How to get elements between two tags Simple Html Dom如何获取两个标签之间的元素 Simple Html Dom
【发布时间】:2017-02-07 08:29:38
【问题描述】:

这是我的 HTML

<b><font color="Red">Flash Player 720p HD Quality Online Links</font></b>
        <br>
        <br>
        <a href="http://bestarticles.me/jaana-na-dil-se-door/?si=5325359" target="_blank">Jaana Na Dil Se Door 6th February 2017 Watch Online Video- Part 1</a>
        <br>
        <a href="http://bestarticles.me/jaana-na-dil-se-door/?si=5325360" target="_blank">Jaana Na Dil Se Door 6th February 2017 Watch Online Video- Part 2</a>
        <br>
        <br>
        <b><font color="Red">Dailymotion 720p HD Quality Online Links</font></b>
        <br>
        <br>
        <a href="http://bestarticles.me/jaana-na-dil-se-door/?si=k4r2rHPOgem8yAlGqjj" target="_blank">Jaana Na Dil Se Door 6th February 2017 Watch Online Video- Part 1</a>
        <br>
        <a href="http://bestarticles.me/jaana-na-dil-se-door/?si=k63MLC2Vq6fxsPlGqjp" target="_blank">Jaana Na Dil Se Door 6th February 2017 Watch Online Video- Part 2</a>
        <br>
        <br>
        <b><font color="Red">TVLogy 720p HD Quality Online Links</font></b>
        <br>
        <br>
        <a href="http://reviewtv.in/star-plus/?si=YD29025" target="_blank">Jaana Na Dil Se Door 6th February 2017 Watch Online Video- Part 1</a>
        <br>
        <a href="http://reviewtv.in/star-plus/?si=YD29026" target="_blank">Jaana Na Dil Se Door 6th February 2017 Watch Online Video- Part 2</a>
        <br>
        <br>
        <b><font color="Red">Letwatch 720p HD Quality Online Links</font></b>
        <br>
        <br>
        <a href="http://www.tellycolors.me/star-plus/?si=j3vpekz3jeiv" target="_blank">Jaana Na Dil Se Door 6th February 2017 Watch Online Video - Part 1</a>
        <br>
        <a href="http://www.tellycolors.me/star-plus/?si=bdjg53bz9gdi" target="_blank">Jaana Na Dil Se Door 6th February 2017 Watch Online Video - Part 2</a>
        <br>
        <br>
        <b><font color="Red">Vidwatch 720p HD Quality Online Links</font></b>
        <br>
        <br>
        <a href="http://hd-rulez.info/vidwatch.php?id=73sbn356g9nc" target="_blank">Jaana Na Dil Se Door 6th February 2017 Watch Online Video - Part 1</a>
        <br>
        <a href="http://hd-rulez.info/vidwatch.php?id=73x796cifyvq" target="_blank">Jaana Na Dil Se Door 6th February 2017 Watch Online Video - Part 2</a>
        <br>
        <br>

我正在使用 Simple Html Dom php 库进行报废。我想用他们的锚标签废弃&lt;b&gt; 标签。每个&lt;b&gt; 元素都有它们的&lt;a&gt; 锚集。所以我想这样报废

array(
       'Flash Player' => array( 'link1', 'link2' ),
       'Daiylymotion' => array('link1', 'link2', 'link3'),
       etc...
);

这就是我正在做的事情。首先我转义了所有&lt;br&gt;标签然后循环所有&lt;b&gt;标签然后我试图通过$b->next_sibling()获取&lt;b&gt;标签的下一个兄弟姐妹,但它不起作用,因为通过转义&lt;br&gt;标签索引元素未更新。这是我的代码

$html = str_get_html($html);
$content = $html->find('div.postcontent',0);

   //escape all br
    foreach($content->find('br') as $br){
        $br->outertext = '';
    }

    foreach($content->find('b') as $key => $b){


        echo $b->plaintext;

    }

请帮助我用另一种策略废弃&lt;b&gt; 和他们的&lt;a&gt; 标签。谢谢

【问题讨论】:

    标签: php dom web-scraping simple-html-dom


    【解决方案1】:

    不知道有没有其他更简单的方法。但是只要每个&lt;b&gt; 标签后面正好有两个 &lt;a&gt; 标签,这段代码就会给出你想要的输出。

        $aCount = 0;
        $result = array();
        foreach($content->find('b') as $key => $b){
            $index = $b->plaintext;  
            for($i=0;$i<2;$i++){
                $result[$index][] = $content->find('a',$aCount++)->href;
            }       
        }  
        print_r($result);
    

    输出会是这样的

    Array
    (
        [Flash Player 720p HD Quality Online Links] => Array
            (
                [0] => http://bestarticles.me/jaana-na-dil-se-door/?si=5325359
                [1] => http://bestarticles.me/jaana-na-dil-se-door/?si=5325360
            )
    
        [Dailymotion 720p HD Quality Online Links] => Array
            (
                [0] => http://bestarticles.me/jaana-na-dil-se-door/?si=k4r2rHPOgem8yAlGqjj
                [1] => http://bestarticles.me/jaana-na-dil-se-door/?si=k63MLC2Vq6fxsPlGqjp
            )
    
        [TVLogy 720p HD Quality Online Links] => Array
            (
                [0] => http://reviewtv.in/star-plus/?si=YD29025
                [1] => http://reviewtv.in/star-plus/?si=YD29026
            )
    
        [Letwatch 720p HD Quality Online Links] => Array
            (
                [0] => http://www.tellycolors.me/star-plus/?si=j3vpekz3jeiv
                [1] => http://www.tellycolors.me/star-plus/?si=bdjg53bz9gdi
            )
    
        [Vidwatch 720p HD Quality Online Links] => Array
            (
                [0] => http://hd-rulez.info/vidwatch.php?id=73sbn356g9nc
                [1] => http://hd-rulez.info/vidwatch.php?id=73x796cifyvq
            )
    
    )
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2013-09-01
      • 1970-01-01
      • 1970-01-01
      • 2011-02-26
      相关资源
      最近更新 更多