Java 获取字符串字符个数
ClOONG

Java Unicode

基本字符的范围为:U+0000 ~ U+FFFF

辅助字符的范围为:U+100000 ~ U+10FFFF

这些映射字符的编号就称之为“Code Point”

字节、码点和字符的关系

关系图

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
char supportCh = 'a';
// char unSupportCh = '😊'; // Too many characters in character literal
String supportStr1 = "abc";
String supportStr2 = "😊bc";
// 打印字符串的第 0 个字符
System.out.println(supportStr1.charAt(0)); // 输出:a
System.out.println(supportStr2.charAt(0)); // 输出乱码:?

// 打印字符串的长度
System.out.println(supportStr1.length()); // 输出:3
System.out.println(supportStr2.length()); // 输出:4

// 打印字符串的字符数,codePointCount 方法获取码点,一个字符是一个码点。辅助字符占两个码点
System.out.println(supportStr1.codePointCount(0, supportStr1.length())); // 输出:3
System.out.println(supportStr2.codePointCount(0, supportStr2.length())); // 输出:3

// 打印所有的码点
System.out.println(Arrays.toString(supportStr1.codePoints().toArray())); // 输出:[97, 98, 99]
System.out.println(Arrays.toString(supportStr2.codePoints().toArray())); // 输出:[128522, 98, 99]

// 判断指定的 Unicode 码点是佛为辅助字符
System.out.println(Character.isSupplementaryCodePoint(97)); // false
System.out.println(Character.isSupplementaryCodePoint(128522)); // true (128522 为 😊 的码点)

// 判断 char 是否为辅助码点
System.out.println(Character.isSurrogate(supportStr2.charAt(0))); // true charAt(0) 为 😊 的第一个码点单元
System.out.println(Character.isSurrogate(supportStr2.charAt(1))); // true charAt(1) 为 😊 的第二个码点单元
System.out.println(Character.isSurrogate(supportStr2.charAt(3))); // false charAt(3) 为字符 b

拓展阅读:Java 的字符编解码及CodePoint · 奔跑的锅炉 (boileryao.com)

  • Post title:Java 获取字符串字符个数
  • Post author:ClOONG
  • Create time:2022-04-23 21:13:52
  • Post link:https://lclfeng.github.io/2022/04/23/-Java 获取字符串字符个数/
  • Copyright Notice:All articles in this blog are licensed under BY-NC-SA unless stating additionally.