String.codePointAt 到底是做什么的？答案

【问题标题】：What exactly does String.codePointAt do?String.codePointAt 到底是做什么的？
【发布时间】：2025-11-22 11:20:07
【问题描述】：

最近我在Java中遇到了codePointAt String 的方法。我还发现了其他一些 codePoint 方法：codePointBefore、codePointCount 等。它们肯定与 Unicode 相关，但我不明白。

现在我想知道何时以及如何使用codePointAt 和类似方法。

【问题讨论】：

标签： java string unicode codepoint

【解决方案1】：

简答：它会为您提供从String 中的指定索引开始的Unicode codepoint。即该位置字符的“unicode number”。

更长的答案： Java 是在 16 位（又名 char）足以容纳任何存在的 Unicode 字符时创建的（这些部分现在称为 Basic Multilingual Plane or BMP）。后来，Unicode 被扩展为包含代码点 > 2¹⁶ 的字符。这意味着 char 不能再保存所有可能的 Unicode 代码点。

UTF-16 是解决方案：它将“旧”Unicode 代码点存储为 16 位（即恰好一个 char）和所有新的 32 位（即两个 char 值）。这两个 16 位值称为“代理对”。现在严格来说，char 拥有“UTF-16 代码单元”，而不是过去的“Unicode 字符”。

现在所有“旧”方法（仅处理 char）都可以正常使用，只要您不使用任何“新”Unicode 字符（或不真正关心它们），但是如果您也关心新字符（或者只是需要完整的 Unicode 支持），那么您需要使用实际上支持所有可能的 Unicode 代码点的“代码点”版本。

注意： 一个非常著名的不在 BMP 中的 unicode 字符示例（即仅在使用代码点变体时有效）是表情符号：即使是简单的 Grinning Face ? U+1F600 也可以不能用单个 char 表示。

【讨论】：

您能否提供一个示例，其中charAt() 无法提供完整的代码点，但codePointAt() 会成功？
对于 Zaid Khan：字符串 s3 = "\u0041\u00DF\u6771\uD801\uDC00"; System.out.println(s3.charAt(3)); System.out.println(s3.codePointAt(3));

【解决方案2】：

代码点支持大于 65535 的字符，即 Character.MAX_VALUE。

如果您的文本包含如此高的字符，则必须使用代码点或 int 而不是 chars。

它不是通过支持 UTF-16 来实现的，它可以使用一个或两个 16 位字符并将其转换为 int

AFAIK，一般只有Supplementary Multiligual 和Supplementary Ideographic 最近添加的字符（例如非繁体中文）才需要。

【讨论】：

嗯，不只是非繁体中文：en.wikipedia.org/wiki/Plane_(Unicode) 许多鲜为人知的语言、一些数学符号、表情符号以及几乎所有最近引入 Unicode 的东西在 BMP 之外。有一个relevant question here。

【解决方案3】：

下面的代码示例有助于阐明codePointAt的用法

    String myStr = "1?3";
    System.out.println(myStr.length()); // print 4, because ? is two char
    System.out.println(myStr.codePointCount(0, myStr.length())); //print 3, factor in all unicode
    
    int result = myStr.codePointAt(0);
    System.out.println(Character.toChars(result)); // print 1
    
    result = myStr.codePointAt(1);
    System.out.println(Character.toChars(result)); // print ?, because codePointAt will get surrogate pair (high and low)
    
    result = myStr.codePointAt(2);
    System.out.println(Character.toChars(result)); // print low surrogate of ? only, in this case it show "?"
    
    result = myStr.codePointAt(3);
    System.out.println(Character.toChars(result)); // print 3

【讨论】：

【解决方案4】：

简而言之，只要您在 Java 中使用默认字符集就很少 :) 但要获得更详细的解释，请尝试以下帖子：

Comparing a char to a code-point? http://docs.oracle.com/javase/1.5.0/docs/api/java/lang/Character.html http://javarevisited.blogspot.com/2012/01/java-string-codepoint-get-unicode.html

希望这有助于为您澄清事情:)

【讨论】：

这些方法不（直接）与字符集相关（除了我知道没有对 BMP 之外的任何内容进行编码的非通用字符集）。