尽管文件可访问，但“Lighthouse 无法下载 robots.txt 文件”答案

【问题标题】："Lighthouse was unable to download a robots.txt file" despite the file being accessible尽管文件可访问，但“Lighthouse 无法下载 robots.txt 文件”
【发布时间】：2019-10-19 20:33:56
【问题描述】：

我有一个在 http://www.schandillia.com 运行的 NodeJS/NextJS 应用程序。该项目有一个 robots.txt 文件，可通过http://www.schandillia.com/robots.txt 访问。截至目前，该文件是用于测试目的的准系统：

User-agent: *
Allow: /

但是，当我在我的网站上运行 Lighthouse 审核时，它会引发 抓取和索引错误，说它无法下载 robots.txt 文件。我再说一遍，该文件在http://www.schandillia.com/robots.txt 可用。

如果您需要查看该项目的代码库，请访问https://github.com/amitschandillia/proost。 robots.txt 文件位于proost/web/static/，但由于我的 Nginx 配置中的以下内容，可以在根目录下访问：

# ... the rest of your configuration
  location = /robots.txt {
    proxy_pass http://127.0.0.1:3000/static/robots.txt;
  }

完整的配置文件可在 github https://github.com/amitschandillia/proost/blob/master/.help_docs/configs/nginx.conf 上查看。

如果有什么我忽略的地方，请指教。

【问题讨论】：

标签： node.js robots.txt content-security-policy next.js lighthouse

【解决方案1】：

TL;DR: 您的 robots.txt 服务正常，但 Lighthouse 无法正确获取它，因为它的审核目前无法使用您网站内容安全政策的 connect-src 指令，由于一个已知的限制，~~被跟踪为问题#4386~~ 是fixed in Chrome 92。

说明： Lighthouse 尝试通过从您的站点根目录提供的文档运行的脚本来获取robots.txt 文件。以下是它用来执行此请求的代码（在lighthouse-core 中找到）：

const response = await fetch(new URL('/robots.txt', location.href).href);

如果您尝试从您的站点运行此代码，您会注意到抛出“拒绝连接”错误：

发生此错误是因为浏览器从您网站提供的标头中强制执行内容安全策略限制（为了便于阅读，分成几行）：

content-security-policy:
    default-src 'self';
    script-src 'self' *.google-analytics.com;
    img-src 'self' *.google-analytics.com;
    connect-src 'none';
    style-src 'self' 'unsafe-inline' fonts.googleapis.com;
    font-src 'self' fonts.gstatic.com;
    object-src 'self';
    media-src 'self';
    frame-src 'self'

注意connect-src 'none'; 部分。根据the CSP spec，这意味着无法使用脚本接口从所服务的文档中加载任何 URL。实际上，任何fetch 都会被拒绝。

由于您配置 Content Security Policy middleware 的方式（来自 commit a6aef0e），此标头由 Next.js 应用程序的服务器层显式发送：

import csp from 'helmet-csp';

server.use(csp({
  directives: {
    defaultSrc: ["'self'"],
    scriptSrc: ["'self'", '*.google-analytics.com'],
    imgSrc: ["'self'", '*.google-analytics.com'],
    connectSrc: ["'none'"],
    styleSrc: ["'self'", "'unsafe-inline'", 'maxcdn.bootstrapcdn.com'], // Remove unsafe-inline for better security
    fontSrc: ["'self'"],
    objectSrc: ["'self'"],
    mediaSrc: ["'self'"],
    frameSrc: ["'self'"]
  }
}));

解决方案/解决方法：要解决审核报告中的问题，您可以：

在 Lighthouse 中等待（或提交）修复
使用 connect-src 'self' 指令，它的副作用是允许来自 Next.js 应用的浏览器端的 HTTP 请求

【讨论】：