当字符有 8 位时，InputStream 是否与 InputStreamReader 相同？答案

【问题标题】：Is InputStream same as InputStreamReader when a character has 8 bits?当字符有 8 位时，InputStream 是否与 InputStreamReader 相同？
【发布时间】：2018-05-12 07:57:36
【问题描述】：

我正在阅读有关InputStream 和InputStreamReader 的信息。大部分人说InputStream是字节，InputStreamReader是文本。

所以我创建了一个只有一个字符的简单文件，即“a”。当我使用InputStream 读取文件并将其转换为char 时，它打印了字母'a'。当我做了同样的事情但这次使用InputStreamReader 时，它也给了我同样的结果。

那么区别在哪里？我以为InputStream 不能给出字母“a”。

这是否意味着当一个字符有8位时，InputStream和InputStreamReader之间就没有区别了？是不是只有一个角色有多个byte时才会有区别？

【问题讨论】：

标签： java file char inputstream inputstreamreader

【解决方案1】：

不，InputStream 和 InputStreamReader 即使对于 8 位字符也不相同。

看看InputStream's read()没有参数的方法。它返回一个 int 但根据文档，返回一个字节（范围 0 到 255）或 -1 表示 EOF。其他读取方法适用于字节数组。

InputStreamReader 继承自 Reader。没有参数的Reader's read() 方法也返回一个int。但是这里的 int 值（范围 0 到 65535）被解释为一个字符或 -1 表示 EOF。其他读取方法直接使用 char 的数组。

区别在于编码。 InputStreamReader 的构造函数需要显式编码或使用平台的默认编码。编码是字节和字符之间的转换。

您说：“当我使用 InputStream 读取文件并将其转换为字符时，它会打印字母 'a'。” 所以您读取了字节并手动将其转换为字符。此转换部分内置在InputStreamReader 中，使用编码进行翻译。

即使是一字节字符集也存在差异。因此，您的示例是字母“a”，其十六进制值为 61，用于 Windows ANSI 编码（在 Java 中名为“Cp1252”）。但是对于编码IBM-Thai，字节 0x61 被解释为“/”。

所以人们说的没错。 InputStream 用于二进制数据，除此之外还有InputStreamReader 用于文本，根据编码在二进制数据和文本之间进行转换。

这是一个简单的例子：

import java.io.*;

public class EncodingExample {

  public static void main(String[] args) throws Exception {
    // Prepare the byte buffer for character 'a' in Windows-ANSI
    ByteArrayOutputStream baos = new ByteArrayOutputStream();
    final PrintWriter writer = new PrintWriter(new OutputStreamWriter(baos, "Cp1252"));
    writer.print('a');
    writer.flush();
    final byte[] buffer = baos.toByteArray();

    readAsBytes(new ByteArrayInputStream(buffer));
    readWithEncoding(new ByteArrayInputStream(buffer), "Cp1252");
    readWithEncoding(new ByteArrayInputStream(buffer), "IBM-Thai");
  }

  /**
   * Reads and displays the InputStream's bytes as hexadecimal.
   * @param in The inputStream
   * @throws Exception
   */
  private static void readAsBytes(InputStream in) throws Exception {
    int c;
    while((c = in.read()) != -1) {
      final byte b = (byte) c;
      System.out.println(String.format("Hex: %x ", b));
    }
  }

  /**
   * Reads the InputStream with an InputStreamReader and the given encoding.
   * Prints the resulting text to the console.
   * @param in The input stream
   * @param encoding The encoding
   * @throws Exception
   */
  private static void readWithEncoding(InputStream in, String encoding) throws Exception {
    Reader reader = new InputStreamReader(in, encoding);
    int c;
    final StringBuilder sb = new StringBuilder();
    while((c = reader.read()) != -1) {
      sb.append((char) c);
    }
    System.out.println(String.format("Interpreted with encoding '%s': %s", encoding, sb.toString()));
  }
}

输出是：

Hex: 61 
Interpreted with encoding 'Cp1252': a
Interpreted with encoding 'IBM-Thai': /

【讨论】：

您说“此转换部分内置于 InputStreamReader 中，使用编码进行翻译”。那为什么需要这条线sb.append((char) c);？如果Reader 用于文本，为什么需要转换为char？我认为Reader 会有类似getString 这样的方法，因此不需要转换为char 的这一步
是的，API 有点尴尬。 int 用作返回值而不是 char，因为 EOF 的附加值为 -1。仅使用char 是不可能的。但重点是文档中描述的返回值的含义。 InputStream的read方法中返回值的含义是完全不同的，尽管形式类型相同。此外，InputStreamReader 通常包装在BufferedReader 中，以便通过readLine() 逐行读取文件。