【问题标题】:Find <p>(paragraph) tag with specific class and extract it's content using PHP Simple HTML DOM Parser使用 PHP Simple HTML DOM Parser 查找具有特定类的 <p>(paragraph) 标记并提取其内容
【发布时间】:2018-10-17 06:10:49
【问题描述】:

我想在 Work Experience 和 EDUCATION AND TRAINING 之间找到类 ft04 的 p 标签,并从给定的 html 中提取包含公司名称的类文本

<p class = "ft00">Introduction</p>
<p class = "ft00">John Smith</p>
<p class = "ft02">Email:</p>
<p class = "ft02">Phone Number:</p>
<p class = "ft00">John@gmail.com</p>
<p class = "ft00">Work Experience</p>
<p class = "ft00">27 July 2017</p>
<p class = "ft04">ABC Company</p>
<p class = "ft00">19 May 2018</p>
<p class ="ft04">XYZ Company</p>
<p class = "ft00">EDUCATION AND TRAINING</p>

到目前为止,我能得到的是提取工作经验和教育与培训之间的所有数据,它工作正常,代码如下:-

$fexp = $html->find('p[plaintext^=Work Experience]');
$items = array();
 foreach ($fexp as $keye) {

    while ( $keye->nextSibling() ) {
        if ( $keye->nextSibling() == TRUE ) {

         $keye = $keye->nextSibling();
            $varce = $keye->plaintext;



        }
        if ( trim($varce) == "EDUCATION AND TRAINING" ){
            break;
        }
        //$test[] = $collection;
       $items[] = $varce;
        // echo $varce;

}
}
var_dump($items);

我很接近,但似乎无法找到解决方案,任何帮助将不胜感激谢谢:-)

【问题讨论】:

    标签: php html parsing web-scraping simple-html-dom


    【解决方案1】:

    这是工作代码:)

    $html = '<p class = "ft00">Introduction</p>
    <p class = "ft00">John Smith</p>
    <p class = "ft02">Email:</p>
    <p class = "ft02">Phone Number:</p>
    <p class = "ft00">John@gmail.com</p>
    <p class = "ft00">Work Experience</p>
    <p class = "ft00">27 July 2017</p>
    <p class = "ft04">ABC Company</p>
    <p class = "ft00">19 May 2018</p>
    <p class ="ft04">XYZ Company</p>
    <p class = "ft00">EDUCATION AND TRAINING</p>';
    
    $doc = new DOMDocument();
    $doc->loadHTML($html);
    
    $items = array();
    
    foreach ($doc->getElementsByTagName('p') as $p) {
        if (strtolower(trim($p->nodeValue)) == 'work experience') {
            $found = true;
        }
        if (isset($found) && strtolower(trim($p->getAttribute('class'))) == 'ft04') {
            $items[] = $p->nodeValue;
        }
        if (strtolower(trim($p->nodeValue)) == 'education and training') {
            break;
        }
    }
    
    print_r($items);
    

    输出

    Array
    (
        [0] => ABC Company
        [1] => XYZ Company
    )
    

    希望对你有帮助

    【讨论】:

      【解决方案2】:

      您可以通过以下方式检查课程:

      $keye-&gt;getAttribute("class") === "ft04"

      你可以在while语句中设置nextSibling()来缩短你的代码:

      while ($keye = $keye-&gt;nextSibling()) {

      在遍历兄弟姐妹时,您检查类名 ft04 并将 plaintext 添加到您的数组 $items

      您可以将代码更新为:

      $fexp = $html->find('p[plaintext^=Work Experience]');
      $items = array();
      foreach ($fexp as $keye) {
          while ($keye = $keye->nextSibling()) {
              if ($keye->plaintext === "EDUCATION AND TRAINING") {
                  break;
              }
              if($keye->getAttribute("class") === "ft04") {
                  $items[] = $keye->plaintext;
              }
          }
      }
      var_dump($items);
      

      这会给你:

      array(2) {
        [0]=>
        string(11) "ABC Company"
        [1]=>
        string(11) "XYZ Company"
      }
      

      【讨论】:

        猜你喜欢
        • 1970-01-01
        • 1970-01-01
        • 2016-04-21
        • 1970-01-01
        • 2018-05-11
        • 1970-01-01
        • 1970-01-01
        • 2016-09-02
        • 1970-01-01
        相关资源
        最近更新 更多