【问题标题】:How can I get specific elements that are inside a div?如何获取 div 中的特定元素?
【发布时间】:2014-10-01 04:12:54
【问题描述】:

我希望提取一个完整的 div,我可以从其余的源代码中提取该 div。从那个 div,我想要所有的 html 内容,但里面没有一些子 div。 要查询的 HTML 代码:

<div class="content"> 
    <div class="article-title"> 
        <h2>Title of the test</h2> 
        <a href="http://www.helloworld.com" title="post by world" rel="author" class="article-icon"><span class="text-icon">&#x1F464;</span>world</a>
        <span class="article-icon">
            <span class="text-icon">&#x1F4C1;</span> 
                <a href="http://www.helloworld.com/world">world</a>, 
            </span> 
            <span class="article-icon"><span class="text-icon">&#x1F554;</span>20.August 2014
        </span> 
    </div> 
    <p class="p1">
        <span class="s1"><b>a test</b></span>
    </p>  
    <p class="p2">
        <span class="s1">text2</span>
    </p> 
    <p class="p1">
        <span class="s1"><b><a href="http://www.helloworld.com/hello.jpg">
            <img class="alignright size-medium wp-image-19472" src="http://www.helloworld.com/hello.jpg" alt="hello" width="300" height="218"></a>Hello</b>
        </span>
    </p>  
    <p class="p1">
        <span class="s1"><b>text text text</b></span>
    </p> 
    <p class="p1">
        <span class="s1"><b><a href="http://www.helloworld.com/hello2.jpg">
            <img class="alignleft size-medium wp-image-19474" src="http://www.helloworld.com/hello2.jpg" alt="hello2" width="300" height="200"></a>Hello2</b>
        </span>
    </p> 
    <p class="p1">
        <span class="s1">text1</span>
    </p> 
    <p class="p1">
        <span class="s1">text2</span>
    </p> 
    <p class="p1">
        <span class="s1"><b>Final thoughts</b></span>
    </p> 
    <p class="p1">
        <span class="s1">testing (<a href="http://www.helloworld.com/test">
            <span class="s2">test</span></a>, 
            <a href="http://www.helloworld.com/test2">
            <span class="s2">test2</span></a>
        </span>
    </p> 
    <p class="p1">
        <span class="s1">***</span>
    </p> 
    <p class="p5"><em>
        <span class="s1">xyz <a href="http://www.helloworld.com/xyz">
            <span class="s2">123</span></a> (at <a href="http://www.helloworld.com">
            <span class="s2">http://www.helloworld.com</span></a>. &#xA0;
        </span></em>
    </p> 
    <div class="panel-breaking-line"></div> 
    <div class="article-tags"> <b>Tags added to this article</b> 
        <div class="tagcloud"> <a href="http://www.helloworld.com/world">world</a><a href="http://www.helloworld.com/xyz">zyx</a> </div> 
    </div> 
    <div class="panel-breaking-line"></div> 
    <div class="article-socials"> <b>Share this article with friends</b> 
        <div class="social-likes"> 
            <div class="soc-button soc-button-facebook"> <a href="http://www.facebook.com/sharer/sharer.php?u=http://www.helloworld.com/world" data-url="http://www.helloworld.com/world" class="soc-click ot-share">
                <span class="text-icon">&#xF30C;</span>FACEBOOK</a>
                <span class="likes-count">
                    <span class="count">0</span>
                    <span class="bullet">&#xA0;</span>
                </span> 
                </div> 
                <div class="soc-button soc-button-twitter"> <a href="#" class="soc-click ot-tweet" data-hashtags="" data-url="http://www.helloworld.com/world" data-via="" data-text="World">
                    <span class="text-icon">&#xF309;</span>TWITTER</a>
                    <span class="likes-count">
                        <span class="count">0</span>
                        <span class="bullet">&#xA0;</span>
                    </span> 
                </div> 
                <div class="soc-button soc-button-pinterest"> <a href="http://pinterest.com/pin/create/button/?url=http://www.helloworld.com/world" data-url="http://www.helloworld.com/world" class="ot-pin soc-click">
                <span class="text-icon">&#xF312;</span>PINTEREST</a>
                <span class="likes-count">
                    <span class="count">0</span>
                    <span class="bullet">&#xA0;</span>
                </span> 
            </div> 
            <div class="soc-button soc-button-google"> <a href="https://plus.google.com/share?url=http://www.helloworld.com/world" class="ot-pluss soc-click">
                <span class="text-icon">&#xF30F;</span>GOOGLE+</a>
                <span class="likes-count">
                    <span class="count">0</span>
                    <span class="bullet">&#xA0;</span>
                </span> 
            </div> 
        </div> 
    </div> 
</div>

所以基本,我想要所有内容类 html,但没有具有 class="article-title"、class="article-socials" 和 class="article-tags" 的元素

所以它会被剥离为:

<div class="content"> 
    <p class="p1">
        <span class="s1"><b>a test</b></span>
    </p>  
    <p class="p2">
        <span class="s1">text2</span>
    </p> 
    <p class="p1">
        <span class="s1"><b><a href="http://www.helloworld.com/hello.jpg">
            <img class="alignright size-medium wp-image-19472" src="http://www.helloworld.com/hello.jpg" alt="hello" width="300" height="218"></a>Hello</b>
        </span>
    </p>  
    <p class="p1">
        <span class="s1"><b>text text text</b></span>
    </p> 
    <p class="p1">
        <span class="s1"><b><a href="http://www.helloworld.com/hello2.jpg">
            <img class="alignleft size-medium wp-image-19474" src="http://www.helloworld.com/hello2.jpg" alt="hello2" width="300" height="200"></a>Hello2</b>
        </span>
    </p> 
    <p class="p1">
        <span class="s1">text1</span>
    </p> 
    <p class="p1">
        <span class="s1">text2</span>
    </p> 
    <p class="p1">
        <span class="s1"><b>Final thoughts</b></span>
    </p> 
    <p class="p1">
        <span class="s1">testing (<a href="http://www.helloworld.com/test">
            <span class="s2">test</span></a>, 
            <a href="http://www.helloworld.com/test2">
            <span class="s2">test2</span></a>
        </span>
    </p> 
    <p class="p1">
        <span class="s1">***</span>
    </p> 
    <p class="p5"><em>
        <span class="s1">xyz <a href="http://www.helloworld.com/xyz">
            <span class="s2">123</span></a> (at <a href="http://www.helloworld.com">
            <span class="s2">http://www.helloworld.com</span></a>. &#xA0;
        </span></em>
    </p> 
    <div class="panel-breaking-line"></div> 
    <div class="panel-breaking-line"></div> 
</div>

有无内容 div 定义...

我尝试了很多表达方式,最终得出以下结论:

//This is working but returning all content of the div

    $xpath = new DOMXPath($doc);
    $elements = @$xpath->query(".");
    foreach ($elements as $element) 
        $results .= $element->ownerDocument->saveHTML($element);
    } 

然后用这个表达式而不是点:

    div[@class='content']/*[not(contains(concat(' ', @class, ' '), 'article-title')) and not(contains(concat(' ', @class, ' '), 'article-social')) and not(contains(concat(' ', @class, ' '), 'article-tags'))]

它没有给我任何回报,知道我怎样才能让这个东西工作吗?

【问题讨论】:

  • 您只需添加前导// : //div[@class='content']/*[not(contains(concat(' ', @class, ' '), 'article-title')) and not(contains(concat(' ', @class, ' '), 'article-social')) and not(contains(concat(' ', @class, ' '), 'article-tags'))]

标签: php html xpath


【解决方案1】:

你可以明确地将它们放在not(contains())

$dom = new DOMDocument();
$dom->formatOutput = true;
$dom->loadHTML($markup);

$xpath = new DOMXpath($dom);

$elements = $xpath->query('
//div[@class="content"]/*[
    not(contains(@class, "article-title")) and
    not(contains(@class, "article-socials")) and
    not(contains(@class, "article-tags"))
]
');

$html = '';
foreach ($elements as $child) {
    $html .= $dom->saveXML($child);
}

echo htmlentities($html);

Output

【讨论】:

  • 除了出于某种原因我不得不删除 htmlentities 功能外,它工作正常......不知道为什么!
  • @TheGreatOne 我很高兴这有帮助
猜你喜欢
  • 1970-01-01
  • 1970-01-01
  • 2021-03-25
  • 2020-03-19
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
相关资源
最近更新 更多