Java——再看IO

一。编码问题

utf-8编码中，一个中文占3个字节，一个英文占1个字节；gbk编码中，一个中文占2个字节，一个英文占1个字节。
Java是双字节编码，为utf-16be编码，是说一个字符（无论中文还是英文，都占用2个字节）。因此如果这么问：Java字符串中一个字符可以放一个中文吗？是可以的！
如果一直某个字节序列的编码方式，当我们想将它还原成字符串时，应明确指定其编码格式，否则会出现乱码。
文本文件就是字节序列，可以是任意编码的字节序列。如果在中文机器上，直接创建文本文件，该文本文件只认识ANSI编码

public static void main(String[] args) throws UnsupportedEncodingException {
        // TODO Auto-generated method stub
        
        /*
         * 在utf-8编码中中文占3个字节，而bgk编码中中文占2个字节
         */
        String s = "慕课ABC";
        byte[] byte1 = s.getBytes();
        for(byte b : byte1)  //byte 8bits, int 32 bits. xx vs xxxxxxxx
            System.out.print(Integer.toHexString(b & 0xff) + " "); //e6 85 95 e8 af be 41 42 43 (code: utf-8)
        System.out.println();
        
        byte[] byte2 = s.getBytes("gbk");
        for(byte b : byte2) 
            System.out.print(Integer.toHexString(b & 0xff) + " "); //c4 bd bf ce 41 42 43 
        
        /*
         * java是双字节编码，utf-16be编码。意思是Java里的字符串的一个字符占用2个字节 
         * 面试官会问：Java一个字符中可不可以放汉字呢？如果是gbk编码，是可以的。
         */
        System.out.println();
        byte[] byte3 = s.getBytes("utf-16be");
        for(byte b : byte3) 
            System.out.print(Integer.toHexString(b & 0xff) + " "); //61 55(慕) 8b fe(课) 0 41(A) 0 42(B) 0 43(C) 
        
        System.out.println();
        String s1 = new String(byte3);
        System.out.println(s1); //乱码
        String s2 = new String(byte3, "utf-16be");
        System.out.println(s2); //慕课ABC
        
    }

View Code