使用 PHP Simple HTML DOM Parser 处理 html 时不时出现致命错误答案

【问题标题】：Fatal error from time to time while processing html with PHP Simple HTML DOM Parser使用 PHP Simple HTML DOM Parser 处理 html 时不时出现致命错误
【发布时间】：2016-11-26 17:28:41
【问题描述】：

我正在使用 PHP Simple HTML DOM Parser 根据目录号从宜家网站提取一些元数据。

我测试的数字 30275861 和其他几十个工作正常，结果给出了该链接（$produkt 变量）和一些数据http://www.ikea.com/pl/pl/catalog/products/30275861/?query=30275861（如果链接被粘贴到浏览器，它会给出带有 kallax 系统家具的页面）

给出数字 69136138 - 链接结果（$produkt 变量）http://www.ikea.com/pl/pl/catalog/products/S69136138/?query=69136138 如果粘贴到浏览器（最好的电视家具）会出现错误：

致命错误：在布尔值上调用成员函数 find()

在大多数情况下有效的代码如下所示：

<?php

include('simple_html_dom.php');
function clean($string) {
$string = str_replace(',', '.', $string); 
return preg_replace('/[^A-Za-z0-9\-.]/', '', $string); 
}

if(isset($_POST['produkt_id'])){

$produkt_id=str_replace('.', '', $_POST['produkt_id']);

    $url="http://www.ikea.com/pl/pl/search/?query=".$produkt_id;
    $ch = curl_init();
    curl_setopt($ch, CURLOPT_URL, $url);
    curl_setopt($ch, CURLOPT_HEADER, true);
    curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true); // Must be set to true so that PHP follows any "Location:" header
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);

    $a = curl_exec($ch); // $a will contain all headers

    $url = curl_getinfo($ch, CURLINFO_EFFECTIVE_URL); // Return the last effective URL
    $produkt=(string)$url;

$html = file_get_html($produkt);

echo $produkt_id;
echo "<br>";
echo $produkt;

foreach($html->find('meta[name=partnumber]') as $e) echo $kod=$e->content;
foreach($html->find('link[rel=image_src"]') as $e) echo $obrazek=$e->href;      
foreach($html->find('meta[name=title]') as $e) echo $nazwa=$e->content; 
foreach($html->find('meta[name=price]') as $e) echo $cena=floatval(clean($e->content)); 
?>

【问题讨论】：

什么是file_get_html？
@Dekel file_get_html 是simple_html_dom 类的方法... ;-)
因为它不是标准的 PHP 类/方法，你应该在你的帖子中提到它（应该提供指向库的链接）。
[..]使用 PHP 简单 HTML DOM 解析器。并在代码中包含（'simple_html_dom.php'）；但是你在这里是链接simplehtmldom.sourceforge.net

标签： php html dom

【解决方案1】：

为什么不尝试将 foreach 循环包装在条件语句中，以便循环仅在 $html 既不为空也不为空时运行？

    <?php

         // ... SOME CODE ABOVE            
         $html = file_get_html($produkt);

        /* WHY NOT (AT THIS POINT) TRY TO INSPECT THE CONTENT OF $html? WITH VAR_DUMP?*/
        var_dump($html);       //<== JUST TO SEE THE DATA CONTENT...

        // ONLY RUN FOR LOOPS (ECHOING OUT SOME DATA)
        // IF AND ONLY IF $html HAS SOME CONTENT     
        if($html && !empty($html)){
            echo $produkt_id;  //<== MAKES SENSE TO ECHO THIS ONLY IF $html HAS DATA
            echo "<br>";       //<== MAKES SENSE TO ECHO THIS ONLY IF $html HAS DATA
            echo $produkt;     //<== MAKES SENSE TO ECHO THIS ONLY IF $html HAS DATA

            foreach($html->find('meta[name=partnumber]') as $e){
                echo $kod=$e->content;
            }
            foreach($html->find('link[rel=image_src"]') as $e){
                echo $obrazek=$e->href;   
            } 
            foreach($html->find('meta[name=title]') as $e){
                echo $nazwa=$e->content; 
            }
            foreach($html->find('meta[name= price]') as $e){
                echo $cena=floatval(clean($e->content)); 
            }
        }

【讨论】：

谢谢波伊兹。的确，代码很脏而且不是最优的，但这不是重点——有没有人能告诉我为什么两个链接都指向同一个页面，但一个作为数据源而另一个不是——这就是这里的问题。