无法抓取 Zazzle 产品网址答案

【问题标题】：Unable to scrape Zazzle product url无法抓取 Zazzle 产品网址
【发布时间】：2016-12-12 18:37:35
【问题描述】：

以下是我的代码，我正在尝试抓取以下 URL，但由于某种原因，html 源代码根本没有被抓取。为什么此 URL 上没有进行抓取？

我尝试使用 File_get_contents 以及 Simple HTML DOM 库，但没有成功。

URL: http://www.zazzle.com/protoceratops_t_shirt-235065458404753105

function get_data($url) {
    $ch = curl_init();
    $timeout = 5;
    curl_setopt($ch, CURLOPT_URL, $url);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
    curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout);
    $data = curl_exec($ch);
    curl_close($ch);
    return $data;
}

echo get_data('http://www.zazzle.com/protoceratops_t_shirt-235065458404753105');

【问题讨论】：

您是否遇到错误？该代码是否仅返回 http://www.google.com/ 的任何内容？

标签： php curl web-scraping

【解决方案1】：

你可以试试这个：

function get_data($url) {
    try {
        $ch = curl_init();

        $timeout = 5;

        if (FALSE === $ch)
            throw new Exception('failed to initialize');

        curl_setopt($ch, CURLOPT_URL, $url);
        curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
        curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout);

        $content = curl_exec($ch);

        if (FALSE === $content)
            throw new Exception(curl_error($ch), curl_errno($ch));
        // ...process $content now
        return $content;

    } catch(Exception $e) {

        trigger_error(sprintf(
            'Curl failed with error #%d: %s',
            $e->getCode(), $e->getMessage()),
            E_USER_ERROR);
    }
}

echo get_data('http://www.zazzle.com/protoceratops_t_shirt-235065458404753105');

如果您碰巧有任何错误，这也会返回错误。

所有功劳归于： curl_exec() always returns false

【讨论】：