【问题标题】:Find div with class and it's plain-text using PHP Simple HTML DOM Parser使用 PHP Simple HTML DOM Parser 查找带有类的 div,它是纯文本
【发布时间】:2018-05-05 15:15:04
【问题描述】:

我想在 Work ExperienceEDUCATION AND TRAINING 之间找到课程 ft00,并从给定的 html 中提取包含日期的课程文本 p>

<p class = "ft00">Introduction</p>
<p class = "ft00">John Smith</p>
<p class = "ft02">Email:</p>
<p class = "ft00">John@gmail.com</p>
<p class = "ft00">Work Experience</p>
<p class = "ft00">27 July 2017</p>
<p class = "ft02">ABC Company</p>
<p class = "ft00">19 May 2018</p>
<p class ="ft02">XYZ Company</p>
<p class = "ft00">EDUCATION AND TRAINING</p>

到目前为止,我能得到的是提取 工作经验教育和培训 之间的所有数据,它工作正常,代码如下:-

$fexp = $html->find('p[plaintext^=Work Experience]');
$items = array();
 foreach ($fexp as $keye) {

    while ( $keye->nextSibling() ) {
        if ( $keye->nextSibling() == TRUE ) {

         $keye = $keye->nextSibling();
            $varce = $keye->plaintext;



        }
        if ( trim($varce) == "EDUCATION AND TRAINING" ){
            break;
        }
        //$test[] = $collection;
       $items[] = $varce;
        // echo $varce;

}
}
var_dump($items);

我很接近但似乎无法找到解决方案,任何帮助将不胜感激谢谢:-)

【问题讨论】:

  • 明确你的问题
  • @PreciousTom 哪一部分你没有得到?

标签: php html parsing web-scraping simple-html-dom


【解决方案1】:

使用DOMDocumentDOMXPath,您可以像下面这样进行操作,我从未使用过 Simple HTML DOM Parser,但我认为它具有 XPath。

<?php
$dom = new DOMDocument();

$dom->loadHtml('
<p class = "ft00">Introduction</p>
<p class = "ft00">John Smith</p>
<p class = "ft02">Email:</p>
<p class = "ft00">John@gmail.com</p>
<p class = "ft00">Work Experience</p>
<p class = "ft00">27 July 2017</p>
<p class = "ft02">ABC Company</p>
<p class = "ft00">19 May 2018</p>
<p class ="ft02">XYZ Company</p>
<p class = "ft00">EDUCATION AND TRAINING</p>
', LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);

$xpath = new DOMXPath($dom);

$result = [];
$matching  = false;
foreach ($xpath->query("//p[contains(@class, 'ft00') or contains(@class, 'ft02')]/text()") as $p) {
    if ($p->nodeValue === 'Work Experience' || $matching) {
        $result[] = $p->nodeValue;
        $matching = true;
    }
    if ($p->nodeValue === 'EDUCATION AND TRAINING') {
        break;
    }
}

print_r($result);

结果:

Array
(
    [0] => Work Experience
    [1] => 27 July 2017
    [2] => ABC Company
    [3] => 19 May 2018
    [4] => XYZ Company
    [5] => EDUCATION AND TRAINING
)

https://3v4l.org/0nvr4

【讨论】:

  • 感谢我使用了与您相同的逻辑并使用简单的 html dom 实现它,它工作正常。
【解决方案2】:

这是正确的工作代码:-

$test = array();
$matching  = false;
$collection = $html->find('p.ft00');
foreach ($collection as $tkey) {
    if ($tkey->plaintext == "WORK EXPERIENCE" || $matching ) {
        $test[] = $tkey->plaintext;
        $matching = true;
    }
    if ( $tkey->plaintext == "EDUCATION AND TRAINING") {
        break;
    }

    }
    var_dump($test);    

输出:-

Array
(
    [0] => Work Experience
    [1] => 27 July 2017
    [2] => 19 May 2018
    [3] => EDUCATION AND TRAINING
)

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2012-04-08
    • 2013-06-15
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多