为什么 File.ReadAllText() 也能识别 UTF-16 编码？答案

【问题标题】：为什么 File.ReadAllText() 也能识别 UTF-16 编码？
【发布时间】：2022-01-23 11:22:38
【问题描述】：

我使用读取文件

File.ReadAllText(..., Encoding.ASCII);

根据the documentation [MSDN] _{（强调我的）}，

此方法尝试根据字节顺序标记的存在自动检测文件的编码。可以检测编码格式 UTF-8 和 UTF-32（big-endian 和 little-endian）。

但是，在我的情况下，ASCII 文件错误地以 0xFE 0xFF 开头，它检测到 UTF-16（可能是大端，但我没有检查）。

【问题讨论】：

标签： c# .net file

【解决方案1】：

根据File [referencesource]，它使用了一个StreamReader：

private static String InternalReadAllText(String path, Encoding encoding, bool checkHost)
{
  ...
  using (StreamReader sr = new StreamReader(path, encoding, true, StreamReader.DefaultBufferSize, checkHost))
    return sr.ReadToEnd();
}

that StreamReader overload with 5 parameter [MSDN] 也被记录为支持 UTF-16

如果文件以适当的字节开头，它会自动识别 UTF-8、little-endian Unicode、big-endian Unicode、little-endian UTF-32 和 big-endian UTF-32 文本订单标记。否则，使用用户提供的编码。

_{（强调我的）}

由于File.ReadAlltext() 应该并记录在案以检测 Unicode BOM，因此它也检测 UTF-16 可能是一个好主意。但是，文档是错误的，应该更新。我提交了issue #7515。

【讨论】：