如何在 PHP 中获取网页的 HTML 代码？答案

【问题标题】：How do I get the HTML code of a web page in PHP?如何在 PHP 中获取网页的 HTML 代码？
【发布时间】：2010-10-23 13:57:16
【问题描述】：

我想在 PHP 中检索链接（网页）的 HTML 代码。例如，如果链接是

https://stackoverflow.com/questions/ask

然后我想要提供的页面的 HTML 代码。我想检索此 HTML 代码并将其存储在 PHP 变量中。

我该怎么做？

【问题讨论】：

你能解释一下吗？您想向给定 URL 发送 Web 请求并读取对我猜的变量的响应？
是的，我想要同样的东西，我想要该 Web 请求返回的变量中的整个源代码。
您可以使用this tool 轻松废弃html。
即使allow_url_fopen 设置为true，这个函数也不会返回页面的HTML？我还应该检查什么？

标签： php html

【解决方案1】：

如果您的 PHP 服务器允许 url fopen 包装器，那么最简单的方法是：

$html = file_get_contents('https://stackoverflow.com/questions/ask');

如果您需要更多控制，那么您应该查看cURL 函数：

$c = curl_init('https://stackoverflow.com/questions/ask');
curl_setopt($c, CURLOPT_RETURNTRANSFER, true);
//curl_setopt(... other options you want...)

$html = curl_exec($c);

if (curl_error($c))
    die(curl_error($c));

// Get the status code
$status = curl_getinfo($c, CURLINFO_HTTP_CODE);

curl_close($c);

【讨论】：

我担心404。如果链接不存在，那么我不想要它的内容，而是想显示一条错误消息？？我们如何发现该 url 是否给出 404 错误（只是 menas URL 是否有效）？
@Prashant：我已经编辑添加了 curl_getinfo 调用，它将为您提供 200 或 404 或其他任何值
还有PHP怎么可能获取当前页面的HTML？
这是跨域的吗？
即使allow_url_fopen 设置为true，这个函数也不会返回页面的HTML？我还应该检查什么？我正在使用 PHP 7.2

【解决方案2】：

此外，如果您想以某种方式操作检索到的页面，您可能想尝试一些 php DOM 解析器。我发现PHP Simple HTML DOM Parser 非常好用。

【讨论】：

【解决方案3】：

您可能想查看 Yahoo 的 YQL 库：http://developer.yahoo.com/yql

手头的任务很简单

select * from html where url = 'http://stackoverflow.com/questions/ask'

您可以在控制台中试用：http://developer.yahoo.com/yql/console（需要登录）

另请参阅 Chris Heilmanns 的截屏视频，了解您还可以做什么：http://developer.yahoo.net/blogs/theater/archives/2009/04/screencast_collating_distributed_information.html

【讨论】：

【解决方案4】：

简单方法：使用file_get_contents()：

$page = file_get_contents('http://stackoverflow.com/questions/ask');

请注意，allow_url_fopen 中的 true 必须是 php.ini 才能使用 URL 感知 fopen 包装器。

更高级的方法：如果你不能改变你的PHP配置，allow_url_fopen默认是false，如果安装了ext/curl，使用cURL library连接到想要的页面.

【讨论】：

即使allow_url_fopen 设置为true，这个函数也不会返回页面的HTML？我还应该检查什么？

【解决方案5】：

看看这个函数：

http://ru.php.net/manual/en/function.file-get-contents.php

【讨论】：

【解决方案6】：

这里有两种不同的从 URL 获取内容的简单方法：

1) 第一种方法

从您的主机（php.ini 或其他地方）启用 Allow_url_include

<?php
$variableee = readfile("http://example.com/");
echo $variableee;
?>

或

2)第二种方法

启用 php_curl、php_imap 和 php_openssl

<?php
// you can add anoother curl options too
// see here - http://php.net/manual/en/function.curl-setopt.php
function get_dataa($url) {
  $ch = curl_init();
  $timeout = 5;
  curl_setopt($ch, CURLOPT_URL, $url);
  curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.0)");
  curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
  curl_setopt($ch, CURLOPT_SSL_VERIFYHOST,false);
  curl_setopt($ch, CURLOPT_SSL_VERIFYPEER,false);
  curl_setopt($ch, CURLOPT_MAXREDIRS, 10);
  curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
  curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout);
  $data = curl_exec($ch);
  curl_close($ch);
  return $data;
}

$variableee = get_dataa('http://example.com');
echo $variableee;
?>

【讨论】：

很好，第二种方法成功了。谢谢，效果很好。

【解决方案7】：

如果您想将源存储为变量，则可以使用 file_get_contents，但是 curl 是一种更好的做法。

$url = file_get_contents('http://example.com');
echo $url;

此解决方案将在您的网站上显示网页。然而 curl 是一个更好的选择。

【讨论】：

【解决方案8】：

include_once('simple_html_dom.php');
$url="http://stackoverflow.com/questions/ask";
$html = file_get_html($url);

您可以使用此代码将整个 HTML 代码作为数组（解析形式）获取在此处下载“simple_html_dom.php”文件 http://sourceforge.net/projects/simplehtmldom/files/simple_html_dom.php/download

【讨论】：

【解决方案9】：

您也可以使用 DomDocument 方法获取单独的 HTML 标记级别变量

$homepage = file_get_contents('https://www.example.com/');
$doc = new DOMDocument;
$doc->loadHTML($homepage);
$titles = $doc->getElementsByTagName('h3');
echo $titles->item(0)->nodeValue;

【讨论】：

【解决方案10】：

$output = file("http://www.example.com"); 在我启用之前无法使用：allow_url_fopen, allow_url_include, 和 file_uploads in php.ini for PHP7

【讨论】：

【解决方案11】：

我试过这段代码，它对我有用。

$html = file_get_contents('www.google.com');
$myVar = htmlspecialchars($html, ENT_QUOTES);
echo($myVar);

【讨论】：