要查找 HTML 中的所有链接,您可以使用 preg_match_all()。
$links = preg_match_all ("/href=\"([^\"]+)\"/i", $content, $matches);
那个 url https://qc.yahoo.com/ 使用 gzip 压缩,所以你必须检测它并使用函数 gzdecode() 解压缩它。 (它必须安装在你的 PHP 版本中)
gzip 压缩由Content-Encoding: gzip HTTP 标头指示。您必须检查该标头,因此您必须使用 curl 或类似方法来检索标头。
(file_get_contents() 不会为您提供 HTTP 标头...它仅下载 gzip 压缩内容。您需要检测它是否已压缩,但为此您需要阅读标头。)
这是一个完整的例子:
<?php
$url = "https://qc.yahoo.com/";
# download resource
$c = curl_init ($url);
curl_setopt ($c, CURLOPT_HEADER, true);
curl_setopt ($c, CURLOPT_RETURNTRANSFER, true);
$content = curl_exec ($c);
$hsize = curl_getinfo ($c, CURLINFO_HEADER_SIZE);
curl_close ($c);
# separate headers from content
$headers = substr ($content, 0, $hsize);
$content = substr ($content, $hsize);
# check if content is compressed with gzip
$gzip = 0;
$headers = preg_split ('/\r?\n/', $headers);
foreach ($headers as $h)
{
$pieces = preg_split ("/:/", $h, 2);
$pieces2 = (count ($pieces) > 1);
$enc = $pieces2 && (preg_match ("/content-encoding/i", $pieces[0]) );
$gz = $pieces2 && (preg_match ("/gzip/i", $pieces[1]) );
if ($enc && $gz)
{
$gzip = 1;
break;
}
}
# unzip content if gzipped
if ($gzip)
{
$content = gzdecode ($content);
}
# find links
$links = preg_match_all ("/href=\"([^\"]+)\"/i", $content, $matches);
# output results
echo "url = " . htmlspecialchars ($url) . "<br>";
echo "links found (" . count ($matches[1]) . "):" . "<br>";
$n = 0;
foreach ($matches[1] as $link)
{
$n++;
echo "$n: " . htmlspecialchars ($link) . "<br>";
}