为什么 WebSockets 被屏蔽了？答案

【问题标题】：Why are WebSockets masked?为什么 WebSockets 被屏蔽了？
【发布时间】：2016-01-19 21:54:24
【问题描述】：

我正在关注 MDN 在 Writing a WebSocket server 上提供的指南，该指南非常简单易懂...

但是，在遵循本教程后，我遇到了来自客户端的 WebSocket 消息发送到的框架：


0                   1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-------+-+-------------+-------------------------------+
|F|R|R|R| opcode|M| Payload len |    Extended payload length    |
|I|S|S|S|  (4)  |A|     (7)     |             (16/64)           |
|N|V|V|V|       |S|             |   (if payload len==126/127)   |
| |1|2|3|       |K|             |                               |
+-+-+-+-+-------+-+-------------+ - - - - - - - - - - - - - - - +
|     Extended payload length continued, if payload len == 127  |
+ - - - - - - - - - - - - - - - +-------------------------------+
|                               |Masking-key, if MASK set to 1  |
+-------------------------------+-------------------------------+
| Masking-key (continued)       |          Payload Data         |
+-------------------------------- - - - - - - - - - - - - - - - +
:                     Payload Data continued ...                :
+ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +
|                     Payload Data continued ...                |
+---------------------------------------------------------------+

在做了一些函数来正确地取消屏蔽客户端发送的数据和帧之后，这让我想知道为什么数据一开始就被屏蔽了。我的意思是，您不必屏蔽从服务器发送的数据...

如果有人出于不良原因获取数据，则取消屏蔽可能相对容易，因为屏蔽键包含在整个消息中。或者即使他们没有密钥，帧中的掩码密钥也只有 2 个字节长。由于密钥非常小，因此有人可以轻松地揭开数据的面纱。

我想知道为什么要屏蔽数据的另一个原因是，您可以简单地通过在 TLS/SSL 和 HTTPS 上使用 WSS（WebSockets Secure）比屏蔽更好地保护您的 WebSocket 数据。

我是否错过了 WebSocket 被屏蔽的原因？似乎它只是增加了毫无意义的努力来揭露客户端发送的数据，而它并没有增加任何安全性。

【问题讨论】：

这里的部分答案：What is the mask in a webSocket frame 和 Is masking really necessary when sending from webSocket client 和 How does websocket framing protect against cache poisoning。
@jfriend00，它确实提供了一些关于标准为什么定义它的见解。但是根据我的论点，我仍然不明白为什么客户需要它。
还有WebSockets - Why do we need to mask data from client to server?
现在在我之前的两篇cmets中有四篇参考文章。你没有看到其中任何一个理由吗？
简而言之：屏蔽并不是为了保护数据不被读取。是为了保护服务器（包括代理服务器）不被恶意使用WebSockets。

标签： security web websocket masking

【解决方案1】：

jfriend00 的评论有很好的信息链接...

我确实想指出一些显而易见的事情，以表明屏蔽未加密的 websocket 连接是必要的要求，而不仅仅是有益的：

代理、路由器和其他中介（尤其是 ISP）经常读取客户端发送的请求并“纠正”任何问题、添加标头和以其他方式“优化”（例如从缓存响应）网络资源消耗。

某些标头和请求类型（例如Connect）通常针对这些中介而不是端点服务器。

由于其中许多设备较旧且不了解 Websockets 协议，因此可能会编辑或处理看起来像 HTTP 请求的明文。

因此，有必要将明文“转移”为无法识别的字节，以启动“通过”而不是“处理”。

在此之后，只是利用掩码确保黑客不会“反转”此掩码来发送恶意帧。

至于要求 wss 而不是屏蔽 - 我知道这是在编写标准时考虑的......但是在证书免费之前，这将使任何需要 SSL/TLS 的 Web 标准成为“有钱人的”标准，而不是而不是互联网范围的解决方案。

至于“为什么要屏蔽 wss 数据？” - 我不确定这个，但我怀疑它是为了让解析器与连接无关并且更容易编写。在明文中，未屏蔽的帧是协议错误，会导致服务器发起断开连接。无论连接如何，让解析器的行为都相同，这使我们能够将解析器与原始 IO 层分开，使其与连接无关，并为基于事件的编程提供支持。

【讨论】：

代理需要更改才能理解升级协议，代理添加不进一步处理流是微不足道的，因为服务器很难实现屏蔽并且它的持续成本永远因为调试很痛苦。客户端all 需要升级以支持这种复杂的协议。我认为真正的答案是，如果 Web 技术更难在服务器中实施，Google 由于规模而具有竞争优势。它根本不是要求。
@teknopaul ，我理解你的心情。但是，我很确定“代理需要更改以理解升级协议”是不正确的......许多代理都有一个通过后备内置并且可以“按原样”工作数据被“损坏”（未被识别为 HTTP 标头）。较旧的代理有问题，因为 Connect 标头（他们未能转发标头），但这不是所有代理，据我所知，许多代理到今天还没有更新，其中很多（但是不是全部）与 Websocket 升级一起工作。
@teknopaul - PS，由于内存缓存未命中，取消制作可能是一个令人讨厌的资源问题，但它是一个简单的循环中的 4 字节 XOR 操作......它真的很容易实现并且在大多数语言中不需要复制数据（我为 Ruby 编写了一个解析器，这会导致字符串复制，但我的 C 解析器更容易编写，并且不涉及复制）。
我正在编写一个流式 C 解析器，我没有获得可被 4 整除的块中的数据。我正在逐字节异或并在 4 字节掩码中记录位置，还有更多我可以在代码运行的机器的字长中优化为 XOR 的代码。都是无谓的努力。 WebSockets 可能只是一个单独的标头 Upgrade:websockets AFAICS 没有其他要求。代理必须改变，没什么大不了的，每个人的生活都更轻松，明智的人可以使用基于文本的协议，正如网络所期望的那样。如果 Google 希望绕过未打补丁的代理，他们可以使用二进制协议和 XORing。
XORing 对您发送的数据不提供任何保证。举个例子。您可以将 0x0 0x0 0x0 0x0 作为掩码并按原样发送数据，因此建议不进行 XORing 存在一些问题并且有解决此问题的方法是不正确的。掩蔽是不必要的烦恼，如果不是，则由于可能有 0000 个掩蔽，因此它已损坏。

【解决方案2】：

实际上，权威的 RFC，RFC 6455 The WebSocket Protocol，有一个解释。我在这里引用它：

 10.3.  Attacks On Infrastructure (Masking)

   In addition to endpoints being the target of attacks via WebSockets,
   other parts of web infrastructure, such as proxies, may be the
   subject of an attack.

   As this protocol was being developed, an experiment was conducted to
   demonstrate a class of attacks on proxies that led to the poisoning
   of caching proxies deployed in the wild [TALKING].  The general form
   of the attack was to establish a connection to a server under the
   "attacker's" control, perform an UPGRADE on the HTTP connection
   similar to what the WebSocket Protocol does to establish a
   connection, and subsequently send data over that UPGRADEd connection
   that looked like a GET request for a specific known resource (which
   in an attack would likely be something like a widely deployed script
   for tracking hits or a resource on an ad-serving network).  The
   remote server would respond with something that looked like a
   response to the fake GET request, and this response would be cached
   by a nonzero percentage of deployed intermediaries, thus poisoning
   the cache.  The net effect of this attack would be that if a user
   could be convinced to visit a website the attacker controlled, the
   attacker could potentially poison the cache for that user and other
   users behind the same cache and run malicious script on other
   origins, compromising the web security model.

   To avoid such attacks on deployed intermediaries, it is not
   sufficient to prefix application-supplied data with framing that is
   not compliant with HTTP, as it is not possible to exhaustively
   discover and test that each nonconformant intermediary does not skip
   such non-HTTP framing and act incorrectly on the frame payload.
   Thus, the defense adopted is to mask all data from the client to the
   server, so that the remote script (attacker) does not have control
   over how the data being sent appears on the wire and thus cannot
   construct a message that could be misinterpreted by an intermediary
   as an HTTP request.

   Clients MUST choose a new masking key for each frame, using an
   algorithm that cannot be predicted by end applications that provide
   data.  For example, each masking could be drawn from a
   cryptographically strong random number generator.  If the same key is
   used or a decipherable pattern exists for how the next key is chosen,
   the attacker can send a message that, when masked, could appear to be
   an HTTP request (by taking the message the attacker wishes to see on
   the wire and masking it with the next masking key to be used, the
   masking key will effectively unmask the data when the client applies
   it).

   It is also necessary that once the transmission of a frame from a
   client has begun, the payload (application-supplied data) of that
   frame must not be capable of being modified by the application.
   Otherwise, an attacker could send a long frame where the initial data
   was a known value (such as all zeros), compute the masking key being
   used upon receipt of the first part of the data, and then modify the
   data that is yet to be sent in the frame to appear as an HTTP request
   when masked.  (This is essentially the same problem described in the
   previous paragraph with using a known or predictable masking key.)
   If additional data is to be sent or data to be sent is somehow
   changed, that new or changed data must be sent in a new frame and
   thus with a new masking key.  In short, once transmission of a frame
   begins, the contents must not be modifiable by the remote script
   (application).

   The threat model being protected against is one in which the client
   sends data that appears to be an HTTP request.  As such, the channel
   that needs to be masked is the data from the client to the server.
   The data from the server to the client can be made to look like a
   response, but to accomplish this request, the client must also be
   able to forge a request.  As such, it was not deemed necessary to
   mask data in both directions (the data from the server to the client
   is not masked).

   Despite the protection provided by masking, non-compliant HTTP
   proxies will still be vulnerable to poisoning attacks of this type by
   clients and servers that do not apply masking.

【讨论】：