为什么 get_headers() 返回 400 Bad request，而 CLI curl 返回 200 OK？答案

【问题标题】：Why get_headers() returns 400 Bad request, while CLI curl returns 200 OK?为什么 get_headers() 返回 400 Bad request，而 CLI curl 返回 200 OK？
【发布时间】：2018-10-08 07:56:01
【问题描述】：

这是网址：https://www.grammarly.com

我正在尝试使用本机 get_headers() 函数获取 HTTP 标头：

$headers = get_headers('https://www.grammarly.com')

结果是

HTTP/1.1 400 Bad Request
Date: Fri, 27 Apr 2018 12:32:34 GMT
Content-Type: text/plain; charset=UTF-8
Content-Length: 52
Connection: close

但是，如果我用curl 命令行工具做同样的事情，结果会有所不同：

curl -sI https://www.grammarly.com/

HTTP/1.1 200 OK
Date: Fri, 27 Apr 2018 12:54:47 GMT
Content-Type: text/html; charset=UTF-8
Content-Length: 25130
Connection: keep-alive

造成这种反应差异的原因是什么？它是在 Grammarly 的服务器端或其他方面实施不佳的安全功能吗？

【问题讨论】：

看起来像“安全功能实施不佳”，因为在get_headers 中设置用户代理会导致HTTP/1.1 302 Found。尝试获取您的 curl 请求标头 - 有可能设置了一些默认用户代理，您得到的是最终响应（在所有重定向之后）。
它可能返回 400 响应，因为在 get_headers() 请求上发送了用户代理标头。

标签： php curl get-headers

【解决方案1】：

只需使用 php curl 函数即可：

function getMyHeaders($url)
{
    $options = array(
        CURLOPT_RETURNTRANSFER => true,    
        CURLOPT_HEADER         => true,    
        CURLOPT_FOLLOWLOCATION => true,    
        CURLOPT_USERAGENT      => "spider",
        CURLOPT_AUTOREFERER    => true,
        CURLOPT_SSL_VERIFYPEER => false,
        CURLOPT_NOBODY => true
    );
    $ch = curl_init($url);
    curl_setopt_array($ch, $options);
    $content = curl_exec($ch);
    curl_close($ch);
    return $content;
}
print_r(getMyHeaders('https://www.grammarly.com'));

【讨论】：

需要考虑的事情：CURLOPT_NOBODY 将请求方法从GET 更改为HEAD，一个好的服务器/Web 应用程序应该以相同的方式处理，但只发送标头，但很多web-apps 没有正确支持，因此会为HEAD 请求返回与GET 请求不同的标头。
@Anthony 感谢您的评论。你说得对，是的。我也尝试像您的答案一样更改默认流上下文，但使用了'ssl' 选项并且它不起作用:)
我认为ssl 上下文有自己的一组选项，但它不会“继承”http 上下文的选项。因此，如果您想为所有 http 请求设置用户代理并将所有 https 请求的 verify-peer 设置为 false，类似于 curl 示例，它看起来像：stream_context_set_default( array( 'http' => array( 'user_agent'=>"spider" ), 'ssl' => array( 'verify_peer'=> false ), ) ) based on php.net/manual/en/context.ssl.php

【解决方案2】：

这是因为get_headers()使用了默认的流上下文，这基本上意味着几乎没有HTTP头被发送到URL，这对于大多数远程服务器来说会很挑剔。通常最有可能导致问题的缺失标头是 User-Agent。您可以在使用stream_context_set_default 调用get_headers() 之前手动设置它。这是一个适合我的示例：

$headers = get_headers('https://www.grammarly.com');

print_r($headers);

// has [0] => HTTP/1.1 400 Bad Request

stream_context_set_default(
    array(
        'http' => array(
            'user_agent'=>"php/testing"
        ),
    )
);

$headers = get_headers('https://www.grammarly.com');

print_r($headers);

// has [0] => HTTP/1.1 200 OK

【讨论】：

刚刚从php.net/manual/en/context.http.php 发现，http 上下文有一个预定义的“user_agent”选项来设置 User-Agent 标头。更新了答案以反映这一点。
更正，绝大多数服务器不会在意，但少数服务器会。