【问题标题】:What is the regex to extract all the emojis from a string?从字符串中提取所有表情符号的正则表达式是什么?
【发布时间】:2014-09-10 12:23:30
【问题描述】:

我有一个用 UTF-8 编码的字符串。例如:

Thats a nice joke ???????????? ????

我必须提取句子中存在的所有表情符号。表情符号可以是任何

当在终端使用命令less text.txt查看这句话时,它被视为:

Thats a nice joke <U+1F606><U+1F606><U+1F606> <U+1F61B>

这是表情符号对应的 UTF 代码。表情符号的所有代码都可以在emojitracker找到。

为了查找所有出现的情况,我使用了正则表达式模式(&lt;U\+\w+?&gt;),但它不适用于 UTF-8 编码的字符串。

以下是我的代码:

    String s="Thats a nice joke ???????????? ????";
    Pattern pattern = Pattern.compile("(<U\\+\\w+?>)");
    Matcher matcher = pattern.matcher(s);
    List<String> matchList = new ArrayList<String>();

    while (matcher.find()) {
        matchList.add(matcher.group());
    }

    for(int i=0;i<matchList.size();i++){
        System.out.println(matchList.get(i));

    }

这个pdfRange: 1F300–1F5FF for Miscellaneous Symbols and Pictographs。所以我想捕捉这个范围内的任何角色。

【问题讨论】:

  • &lt;U+1F606&gt; 字符串特定于 less - 此外,您的解决方案想法也将捕获几乎任何其他 unicode 字符。唯一真正的解决方案是列出与表情符号对应的所有 unicode 代码点。
  • 您必须找到所有要查找的表情符号字符(代码点)的列表,它们是spread over many different Unicode blocksThis PDF 有一个“好样本”(根据第一个链接)...
  • @T.J.Crowder 你刚才提到的 pdf 中写着 Range: 1F300–1F5FF for Miscellaneous Symbols and Pictographs。所以可以说我想捕捉这个范围内的任何角色。现在该怎么办?
  • 我来到这里试图找到一个正则表达式,我可以将它粘贴到 Sublime Text 中以查找表情符号。没有运气。

标签: java regex utf-8 emoji


【解决方案1】:

使用emoji-java,我编写了一个简单的方法,可以删除包括fitzpatrick modifiers 在内的所有表情符号。需要一个外部库,但比那些怪物正则表达式更容易维护。

用途:

String input = "A string ?with a \uD83D\uDC66\uD83C\uDFFFfew ?emojis!";
String result = EmojiParser.removeAllEmojis(input);

emoji-java maven 安装:

<dependency>
  <groupId>com.vdurmont</groupId>
  <artifactId>emoji-java</artifactId>
  <version>3.1.3</version>
</dependency>

分级:

implementation 'com.vdurmont:emoji-java:3.1.3'

编辑:之前提交的答案被提取到 emoji-java 源代码中。

【讨论】:

  • 我喜欢这样的答案。这就像一个魅力。谢谢!
  • 我也使用这个库来删除表情符号,它工作得很好。有一件事,代码 sn-p 已过时,并且在最新版本中对我不起作用(引发了一些模式异常),在文档中建议使用 EmojiParser#removeAllEmojis(String) 并且确实运行顺利。
  • 如果你正在使用这个。这是 jar 的链接:github.com/vdurmont/emoji-java/releases,这是依赖项的链接:mvnrepository.com/artifact/org.json/json/20080701
  • @gidim,请将依赖的版本更新到 3.1.3。您列出的版本 2.0.1 没有 EmojiParser.removeAllEmojis(String input) 除此之外,为伟大的图书馆竖起大拇指!
  • @BrunoCarrier 谢谢!更新。顺便说一句,我不是图书馆的作者。我刚刚写了 emoji 去除功能。
【解决方案2】:

the pdf that you just mentioned 表示范围:1F300–1F5FF 用于杂项符号和象形文字。所以可以说我想捕捉这个范围内的任何角色。现在该怎么办?

好的,但我会注意到您问题中的表情符号超出了该范围! :-)

这些在0xFFFF 之上的事实使事情变得复杂,因为 Java 字符串存储 UTF-16。所以我们不能只使用一个简单的字符类。我们将有代理对。 (更多:http://www.unicode.org/faq/utf_bom.html

UTF-16 中的 U+1F300 最终成为 \uD83C\uDF00 对; U+1F5FF 最终成为\uD83D\uDDFF。请注意,第一个字符上升了,我们至少跨越了一个边界。所以我们必须知道我们正在寻找的代理对的范围。

由于没有深入了解 UTF-16 的内部工作原理,我编写了一个程序来找出答案(源代码在最后——如果我是你,我会仔细检查它,而不是相信我)。它告诉我我们正在寻找 \uD83C 后跟 \uDF00-\uDFFF (含)范围内的任何内容,或 \uD83D 后跟 \uDC00-\uDDFF (含)范围内的任何内容。

有了这些知识,理论上我们现在可以编写一个模式:

// This is wrong, keep reading
Pattern p = Pattern.compile("(?:\uD83C[\uDF00-\uDFFF])|(?:\uD83D[\uDC00-\uDDFF])");

这是两个非捕获组的交替,第一组用于以\uD83C 开头的对,第二组用于以\uD83D 开头的对。

但是失败了(什么也没找到)。我很确定这是因为我们试图在不同的地方指定代理对的 一半

Pattern p = Pattern.compile("(?:\uD83C[\uDF00-\uDFFF])|(?:\uD83D[\uDC00-\uDDFF])");
// Half of a pair --------------^------^------^-----------^------^------^

我们不能像那样拆分代理对,它们被称为代理pairs是有原因的。 :-)

因此,我认为我们根本不能为此使用正则表达式(或者实际上,任何基于字符串的方法)。我认为我们必须搜索 char 数组。

char 数组保存 UTF-16 值,因此如果我们用困难的方式查找,我们可以在数据中找到那些半对:

String s = new StringBuilder()
                .append("Thats a nice joke ")
                .appendCodePoint(0x1F606)
                .appendCodePoint(0x1F606)
                .appendCodePoint(0x1F606)
                .append(" ")
                .appendCodePoint(0x1F61B)
                .toString();
char[] chars = s.toCharArray();
int index;
char ch1;
char ch2;

index = 0;
while (index < chars.length - 1) { // -1 because we're looking for two-char-long things
    ch1 = chars[index];
    if ((int)ch1 == 0xD83C) {
        ch2 = chars[index+1];
        if ((int)ch2 >= 0xDF00 && (int)ch2 <= 0xDFFF) {
            System.out.println("Found emoji at index " + index);
            index += 2;
            continue;
        }
    }
    else if ((int)ch1 == 0xD83D) {
        ch2 = chars[index+1];
        if ((int)ch2 >= 0xDC00 && (int)ch2 <= 0xDDFF) {
            System.out.println("Found emoji at index " + index);
            index += 2;
            continue;
        }
    }
    ++index;
}

显然,这只是调试级代码,但它确实可以完成工作。 (在您给定的字符串中,带有表情符号,当然它不会找到任何东西,因为它们超出了范围。但是如果您将第二对的上限更改为0xDEFF 而不是0xDDFF,它会。不过,不知道这是否也包括非表情符号。)


我的程序的来源,以找出代理范围是什么:

public class FindRanges {

    public static void main(String[] args) {
        char last0 = '\0';
        char last1 = '\0';
        for (int x = 0x1F300; x <= 0x1F5FF; ++x) {
            char[] chars = new StringBuilder().appendCodePoint(x).toString().toCharArray();
            if (chars[0] != last0) {
                if (last0 != '\0') {
                    System.out.println("-\\u" + Integer.toHexString((int)last1).toUpperCase());
                }
                System.out.print("\\u" + Integer.toHexString((int)chars[0]).toUpperCase() + " \\u" + Integer.toHexString((int)chars[1]).toUpperCase());
                last0 = chars[0];
            }
            last1 = chars[1];
        }
        if (last0 != '\0') {
            System.out.println("-\\u" + Integer.toHexString((int)last1).toUpperCase());
        }
    }
}

输出:

\uD83C \uDF00-\uDFFF
\uD83D \uDC00-\uDDFF

【讨论】:

  • @purrrminator:请参阅有关范围的说明。以上只是一个处理特定范围的示例,但我警告过 OP 还有其他的。
【解决方案3】:

遇到了类似的问题。以下对我很有帮助,并且匹配代理对

public class SplitByUnicode {
    public static void main(String[] argv) throws Exception {
        String string = "Thats a nice joke ??? ?";
        System.out.println("Original String:"+string);
        String regexPattern = "[\uD83C-\uDBFF\uDC00-\uDFFF]+";
        byte[] utf8 = string.getBytes("UTF-8");

        String string1 = new String(utf8, "UTF-8");

        Pattern pattern = Pattern.compile(regexPattern);
        Matcher matcher = pattern.matcher(string1);
        List<String> matchList = new ArrayList<String>();

        while (matcher.find()) {
            matchList.add(matcher.group());
        }

        for(int i=0;i<matchList.size();i++){
            System.out.println(i+":"+matchList.get(i));

        }
    }
}

输出是:


Original String:Thats a nice joke ??? ?
0:???
1:?

https://stackoverflow.com/a/24071599/915972找到正则表达式

【讨论】:

  • 这似乎我们工作得很好,也很简单,如果你拿出示例 Java 样板
  • 样板代码只是为了完整性,如果有任何 java 新手想要测试它:)
  • 我尝试使用[\uD83C-\uDBFF\uDC00-\uDFFF]+ 删除表情符号,它也删除了下一个字符-。我最终使用了[\uD800\uDC00-\uDBFF\uDFFF]
【解决方案4】:

只是使用正则表达式来解决它:

s = s.replaceAll("\\p{So}+", "");

您可以在

中找到它

http://www.regular-expressions.info/unicode.html

https://docs.oracle.com/javase/7/docs/api/java/lang/Character.html#OTHER_SYMBOL


【讨论】:

  • 这没有找到适合我的表情符号。
  • @Gandalf458 我更新了我的答案,添加了示例的屏幕截图。
  • 它似乎在 Java 中有效,但在 C# 中无效。我猜 C# 不会将 emoji 视为 Other_Symbol。
  • 在 Presto ([prestodb]) 上为我工作过 generally accepts Java pattern syntax
【解决方案5】:

这在 java 8 中对我有用:

public static String mysqlSafe(String input) {
  if (input == null) return null;
    StringBuilder sb = new StringBuilder();

    for (int i = 0; i < input.length(); i++) {
      if (i < (input.length() - 1)) { // Emojis are two characters long in java, e.g. a rocket emoji is "\uD83D\uDE80";
        if (Character.isSurrogatePair(input.charAt(i), input.charAt(i + 1))) {
          i += 1; //also skip the second character of the emoji
          continue;
        }
      }
      sb.append(input.charAt(i));
    }

  return sb.toString();
}

【讨论】:

  • 非常感谢!为我的需要指明了正确的方向。
  • 这个逻辑只是简单地跳过了 BMP 之外的代码点。这在某些情况下可能看起来不错,但并不总是能正常工作。首先,这不会过滤掉 dingbet 块中的表情符号,其次,这甚至会过滤掉一些稀有字母。
【解决方案6】:

你可以这样做

    String s="Thats a nice joke ??? ?";
    Pattern pattern = Pattern.compile("[\ud83c\udc00-\ud83c\udfff]|[\ud83d\udc00-\ud83d\udfff]|[\u2600-\u27ff]",
                                      Pattern.UNICODE_CASE | Pattern.CASE_INSENSITIVE);
    Matcher matcher = pattern.matcher(s);
    List<String> matchList = new ArrayList<String>();

    while (matcher.find()) {
        matchList.add(matcher.group());
    }

    for(int i=0;i<matchList.size();i++){
        System.out.println(matchList.get(i));
    }

【讨论】:

    【解决方案7】:

    提取所有表情符号的最佳正则表达式是这样的:

    (?:[\u2700-\u27bf]|(?:\ud83c[\udde6-\uddff]){2}|[\ud800-\udbff][\udc00-\udfff]|[\u0023-\u0039]\ufe0f?\u20e3|\u3299|\u3297|\u303d|\u3030|\u24c2|\ud83c[\udd70-\udd71]|\ud83c[\udd7e-\udd7f]|\ud83c\udd8e|\ud83c[\udd91-\udd9a]|\ud83c[\udde6-\uddff]|[\ud83c[\ude01-\ude02]|\ud83c\ude1a|\ud83c\ude2f|[\ud83c[\ude32-\ude3a]|[\ud83c[\ude50-\ude51]|\u203c|\u2049|[\u25aa-\u25ab]|\u25b6|\u25c0|[\u25fb-\u25fe]|\u00a9|\u00ae|\u2122|\u2139|\ud83c\udc04|[\u2600-\u26FF]|\u2b05|\u2b06|\u2b07|\u2b1b|\u2b1c|\u2b50|\u2b55|\u231a|\u231b|\u2328|\u23cf|[\u23e9-\u23f3]|[\u23f8-\u23fa]|\ud83c\udccf|\u2934|\u2935|[\u2190-\u21ff])
    

    它识别出许多其他答案没有考虑的单字符表情符号。有关此正则表达式如何工作的更多信息,请查看这篇文章。 https://medium.com/@thekevinscott/emojis-in-javascript-f693d0eb79fb#.enomgcu63

    【讨论】:

    • 在将其输入Pattern.compile() 方法时,我收到错误Unclosed character class near index 657
    【解决方案8】:

    假设您要求标准 Unicode 表情符号范围(供应商有不同的块),您可以考虑以下三个范围:

    • 0x20a0 - 0x32ff
    • 0x1f000 - 0x1ffff
    • 0xfe4e5 - 0xfe4ee

    除了 T.J.Crowder 与我们分享的所有深思熟虑的解释之外,需要说的是,从 Java 7 开始,可以轻松匹配 UTF-16 编码的代理对。

    查看文档:

    http://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html

    Unicode 字符也可以通过直接使用其十六进制表示法(十六进制代码点值)在正则表达式中表示,如构造 \x{...} 中所述,例如可以指定补充字符 U+2011F作为 \x{2011F},而不是代理对 \uD840\uDD1F 的两个连续 Unicode 转义序列。

    不过,如果你不能切换到 Java 7,你可以扩展 Guava 提供的宝贵的UnicodeEscaper

    这里是一个实现示例:

    public class SimpleEscaper extends UnicodeEscaper
    {
        @Override
        protected char[] escape(int codePoint)
        {
            if (0x1f000 >= codePoint && codePoint <= 0x1ffff)
            {
                return Integer.toHexString(codePoint).toCharArray();
            }
    
            return Character.toChars(codePoint);
        }
    }
    

    【讨论】:

      【解决方案9】:

      表情符号正则表达式

      public static final String sEmojiRegex = "(?:[\\u2700-\\u27bf]|" +
      
              "(?:[\\ud83c\\udde6-\\ud83c\\uddff]){2}|" +
              "[\\ud800\\udc00-\\uDBFF\\uDFFF]|[\\u2600-\\u26FF])[\\ufe0e\\ufe0f]?(?:[\\u0300-\\u036f\\ufe20-\\ufe23\\u20d0-\\u20f0]|[\\ud83c\\udffb-\\ud83c\\udfff])?" +
      
              "(?:\\u200d(?:[^\\ud800-\\udfff]|" +
      
              "(?:[\\ud83c\\udde6-\\ud83c\\uddff]){2}|" +
              "[\\ud800\\udc00-\\uDBFF\\uDFFF]|[\\u2600-\\u26FF])[\\ufe0e\\ufe0f]?(?:[\\u0300-\\u036f\\ufe20-\\ufe23\\u20d0-\\u20f0]|[\\ud83c\\udffb-\\ud83c\\udfff])?)*|" +
      
              "[\\u0023-\\u0039]\\ufe0f?\\u20e3|\\u3299|\\u3297|\\u303d|\\u3030|\\u24c2|[\\ud83c\\udd70-\\ud83c\\udd71]|[\\ud83c\\udd7e-\\ud83c\\udd7f]|\\ud83c\\udd8e|[\\ud83c\\udd91-\\ud83c\\udd9a]|[\\ud83c\\udde6-\\ud83c\\uddff]|[\\ud83c\\ude01-\\ud83c\\ude02]|\\ud83c\\ude1a|\\ud83c\\ude2f|[\\ud83c\\ude32-\\ud83c\\ude3a]|[\\ud83c\\ude50-\\ud83c\\ude51]|\\u203c|\\u2049|[\\u25aa-\\u25ab]|\\u25b6|\\u25c0|[\\u25fb-\\u25fe]|\\u00a9|\\u00ae|\\u2122|\\u2139|\\ud83c\\udc04|[\\u2600-\\u26FF]|\\u2b05|\\u2b06|\\u2b07|\\u2b1b|\\u2b1c|\\u2b50|\\u2b55|\\u231a|\\u231b|\\u2328|\\u23cf|[\\u23e9-\\u23f3]|[\\u23f8-\\u23fa]|\\ud83c\\udccf|\\u2934|\\u2935|[\\u2190-\\u21ff]";
      

      一些表情符号 (1627)

      // count = 1627
      public static final String sEmojiTest = "????????☺️????????????????????????????☹️????????????????????????????????????????????☠️?????????????????????✊???✌️??????☝️✋???????✍️?????????????????????‍♀?????‍♀??‍♀??‍♀??‍♀??️‍♀️??‍⚕?‍⚕?‍??‍??‍??‍??‍??‍??‍??‍??‍??‍??‍??‍??‍??‍??‍??‍??‍??‍??‍??‍??‍??‍??‍??‍??‍✈?‍✈?‍??‍??‍⚖?‍⚖?????????‍♀???‍♂??‍♂??‍♂??‍♂?‍♀?‍♂?‍♀?‍♂??‍♂??‍♂??‍♂??‍♂?????‍♂?‍♀??‍♀??????‍❤️‍??‍❤️‍???‍❤️‍?‍??‍❤️‍?‍???‍?‍??‍?‍?‍??‍?‍?‍??‍?‍?‍??‍?‍??‍?‍??‍?‍?‍??‍?‍?‍??‍?‍?‍??‍?‍??‍?‍??‍?‍?‍??‍?‍?‍??‍?‍?‍??‍??‍??‍?‍??‍?‍??‍?‍??‍??‍??‍?‍??‍?‍??‍?‍?????????????????⛑????????☂️???????????????????????????????????????????????????????????????????????????????????????????????☘️??????????????????????????????????⭐️?✨⚡️??☄☀️?⛅️???☁️?⛈??☃️⛄️❄️???????☔️????????????????????????????????????????????????????????????????????????☕️?????????????⚽️??⚾️??????????⛳️????⛸?⛷??️‍♀️???‍♀?‍♂?‍♀?‍♂⛹️‍♀️⛹?‍♀?‍♂?️‍♀️??‍♀??‍♀??‍♀?‍♂?‍♀???‍♀??‍♀??????????????‍♀?‍♂????????????????????????????????????????????????????????✈️??????⛵️???⛴?⚓️?⛽️??????⛲️???????⛱??⛰?????⛺️????????????????????⛪️???⛩???????????????⌚️???⌨️??????????????????☎️???????⏱⏲⏰?⌛️⏳????????????????⚖️??⚒?⛏?⚙️⛓????⚔️??⚰️⚱️????⚗️??????????????????????????????✉️????????????????????????????????????????????????????✂️??✒️???✏️??????❤️??????❣️????????☮️✝️☪️?☸️✡️??☯️☦️?⛎♈️♉️♊️♋️♌️♍️♎️♏️♐️♑️♒️♓️?⚛️?☢️☣️????️???️✴️???㊙️㊗️?????️?️???️?❌⭕️?⛔️????♨️???????❗️❕❓❔‼️⁉️??〽️⚠️??⚜️?♻️✅?️?❇️✳️❎??Ⓜ️????♿️?️??️?????????????ℹ️?????????0️⃣1️⃣2️⃣3️⃣4️⃣5️⃣6️⃣7️⃣8️⃣9️⃣??#️⃣*️⃣▶️⏸⏯⏹⏺⏭⏮⏩⏪⏫⏬◀️??➡️⬅️⬆️⬇️↗️↘️↙️↖️↕️↔️↪️↩️⤴️⤵️???????➕➖➗✖️??™️©️®️〰️➰➿?????✔️☑️?⚪️⚫️??????????▪️▫️◾️◽️◼️◻️⬛️⬜️?????????‍????♠️♣️♥️♦️???️?????????????????????????️????️‍????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????⚽️??⚾️??????????⛳️????⛸?⛷??️‍♀️??‍♀️??‍♀️??‍♀️??‍♀️??‍♀️?️????????????‍♀️?‍♂️?‍♀️??‍♀️??‍♀️??‍♀️??‍♀️??‍♀️?‍♂️??‍♂️??‍♂️??‍♂️??‍♂️??‍♂️⛹️‍♀️⛹?‍♀️⛹?‍♀️⛹?‍♀️⛹?‍♀️⛹?‍♀️⛹️⛹?⛹?⛹?⛹?⛹??‍♀️??‍♀️??‍♀️??‍♀️??‍♀️??‍♀️?‍♂️??‍♂️??‍♂️??‍♂️??‍♂️??‍♂️?️‍♀️??‍♀️??‍♀️??‍♀️??‍♀️??‍♀️?️???????????‍♀️??‍♀️??‍♀️??‍♀️??‍♀️??‍♀️????????????‍♀️??‍♀️??‍♀️??‍♀️??‍♀️??‍♀️????????????‍♀️??‍♀️??‍♀️??‍♀️??‍♀️??‍♀️?‍♂️??‍♂️??‍♂️??‍♂️??‍♂️??‍♂️?‍♀️??‍♀️??‍♀️??‍♀️??‍♀️??‍♀️???????????????????????‍♀️??‍♀️??‍♀️??‍♀️??‍♀️??‍♀️????????????‍♀️??‍♀️??‍♀️??‍♀️??‍♀️??‍♀️????????????????????????‍♀️?‍♂️?????????????????";
      

      表情符号测试功能

      public void checkMatchingEmojis() {
      
          final Pattern pattern = Pattern.compile(sEmojiRegex);
          final Matcher matcher = pattern.matcher(sEmojiTest);
          int foundEmojiCount = 0;
          while (matcher.find()) {
              System.out.println("Full match: " + matcher.group(0));
              foundEmojiCount++;
          }
          System.out.println("*******************************************");
          System.out.println("Input Emoji count = 1627");
          System.out.println("Captured Emoji count = " + foundEmojiCount);
          System.out.println("*******************************************");
      
      }
      

      Here 是要点,在所有 unicode 10 表情符号上进行了测试

      感谢Kevin Scott 编写了很好的示例

      【讨论】:

        【解决方案10】:

        有两种方法可以解决这个棘手的问题。

        第一个是使用第三方库,如 emoji-java 和 emoji4j。这些都在上面提到。您可以轻松使用containsEmojiremovesEmoji等方法。在您自己的应用程序中,您需要不断更新这些库。

        对于我来说,我想找到一个简单的解决方案来解决这个问题。

        经过一整天的搜索,我发现了一个神奇的正则表达式:

        "(?:[\uD83C\uDF00-\uD83D\uDDFF]|[\uD83E\uDD00-\uD83E\uDDFF]|[\uD83D\uDE00-\uD83D\uDE4F]|[\uD83D\uDE80-\uD83D\uDEFF]|[\u2600-\u26FF]\uFE0F?|[\u2700-\u27BF]\uFE0F?|\u24C2\uFE0F?|[\uD83C\uDDE6-\uD83C\uDDFF]{1,2}|[\uD83C\uDD70\uD83C\uDD71\uD83C\uDD7E\uD83C\uDD7F\uD83C\uDD8E\uD83C\uDD91-\uD83C\uDD9A]\uFE0F?|[\u0023\u002A\u0030-\u0039]\uFE0F?\u20E3|[\u2194-\u2199\u21A9-\u21AA]\uFE0F?|[\u2B05-\u2B07\u2B1B\u2B1C\u2B50\u2B55]\uFE0F?|[\u2934\u2935]\uFE0F?|[\u3030\u303D]\uFE0F?|[\u3297\u3299]\uFE0F?|[\uD83C\uDE01\uD83C\uDE02\uD83C\uDE1A\uD83C\uDE2F\uD83C\uDE32-\uD83C\uDE3A\uD83C\uDE50\uD83C\uDE51]\uFE0F?|[\u203C\u2049]\uFE0F?|[\u25AA\u25AB\u25B6\u25C0\u25FB-\u25FE]\uFE0F?|[\u00A9\u00AE]\uFE0F?|[\u2122\u2139]\uFE0F?|\uD83C\uDC04\uFE0F?|\uD83C\uDCCF\uFE0F?|[\u231A\u231B\u2328\u23CF\u23E9-\u23F3\u23F8-\u23FA]\uFE0F?)"

        我已经在 J​​ava 中测试过。它完美地解决了我的问题。

        您可以在 Github 页面上查看:

        https://github.com/zly394/EmojiRegex

        注意事项:

        @Eric Nakagawa 提供的答案包含一些错误,无法正常操作。

        【讨论】:

        • 这捕获的不仅仅是表情符号。如果你在 Big List of Naughty Strings 上使用它,你会得到很多非表情符号匹配。
        【解决方案11】:

        您也可以使用emoji4j 库。

        String emojiText = "A ?, ? and a ? became friends. For ?'s birthday party, they all had ?s, ?s, ?s and ?.";
        
        EmojiUtils.removeAllEmojis(emojiText);//returns "A ,  and a  became friends. For 's birthday party, they all had s, s, s and .
        

        【讨论】:

          【解决方案12】:

          这是我用来删除表情符号的工具,到目前为止它已显示允许所有其他字母。

          private static String remove_Emojis(String name)
          {  
          
              //we will store all the letters in this array
              ArrayList<Character> nonEmoji = new ArrayList<>();
          
               // and when we rebuild the name we will put it in here
              String newName = "";
          
          
              // we are going to loop through checking each character to see if its an emoji or not
              for (int i = 0; i < name.length(); i++) 
               {
          
                  if (Character.isLetterOrDigit(name.charAt(i)))
                  {
                      nonEmoji.add(name.charAt(i));
                  } 
          
                   else 
                    {
                       // this is just a 2nd check in case the other method didn't allow some letter
                      if (Build.VERSION.SDK_INT > 18)
                      {
                          if (Character.isAlphabetic(name.charAt(i))) 
                          {
                              nonEmoji.add(name.charAt(i));
                          }
                      }
                  }
          
          
                  if (name.charAt(i) == ' ')// may want to consider adding or '-' or '\''
                  {
                      nonEmoji.add(i);// just add it
                  }
          
                  if (name.charAt(i) == '@' && !name.contains(" "))// I put this in for email addresses
                  {
                      nonEmoji.add('@');
                  }
              }
          
              // finally just loop through building it back out
              for (int i = 0; i < nonEmoji.size(); i++) {
          
                  newName += nonEmoji.get(i);
              }
          
              return newName;
          }
          

          【讨论】:

            【解决方案13】:

            这是一种更简单的方法和一个正确解析(当前日期为 2021 年 5 月)所有 3,521 个表情符号的正则表达式。

            这是一种以编程方式构建的简单替换作品,首先匹配最长的表情符号,从而避免了由许多建议模式引起的问题,即复合表情符号中的部分匹配问题。 (例如:??‍❤️‍?‍?? - 因为这是用零宽度连接器 (U+200D) 粘合在一起的几个表情符号,您需要匹配较长的序列,而不需要对组件进行部分匹配)

            为了让模式足够短,可以在这里粘贴,我们大胆地使用了文字表情符号,但 unicode 转义也同样有效(请参阅底部的链接以获取演示和源代码):

            import java.util.regex.Matcher;
            import java.util.regex.Pattern;
            public class MyClass {
                public static void main(String args[]) {
                  String line = "Adds ? word-relevant ? emojis ? ❤ to ? text ✨ with ? sometimes ?? hilarious ? ? results ?. Read more ?? about??‍❤️‍?‍?? ma?tching compound emojis";
                  String pattern = "(?:??‍❤️‍?‍??|??‍❤️‍?‍??|??‍❤️‍?‍??|??‍❤️‍?‍??|??‍❤️‍?‍??|??‍❤️‍?‍??|??‍❤️‍?‍??|??‍❤️‍?‍??|??‍❤️‍?‍??|??‍❤️‍?‍??|??‍❤️‍?‍??|??‍❤️‍?‍??|??‍❤️‍?‍??|??‍❤️‍?‍??|??‍❤️‍?‍??|??‍❤️‍?‍??|??‍❤️‍?‍??|??‍❤️‍?‍??|??‍❤️‍?‍??|??‍❤️‍?‍??|??‍❤️‍?‍??|??‍❤️‍?‍??|??‍❤️‍?‍??|??‍❤️‍?‍??|??‍❤️‍?‍??|??‍❤️‍?‍??|??‍❤️‍?‍??|??‍❤️‍?‍??|??‍❤️‍?‍??|??‍❤️‍?‍??|??‍❤️‍?‍??|??‍❤️‍?‍??|??‍❤️‍?‍??|??‍❤️‍?‍??|??‍❤️‍?‍??|??‍❤️‍?‍??|??‍❤️‍?‍??|??‍❤️‍?‍??|??‍❤️‍?‍??|??‍❤️‍?‍??|??‍❤️‍?‍??|??‍❤️‍?‍??|??‍❤️‍?‍??|??‍❤️‍?‍??|??‍❤️‍?‍??|??‍❤️‍?‍??|??‍❤️‍?‍??|??‍❤️‍?‍??|??‍❤️‍?‍??|??‍❤️‍?‍??|??‍❤️‍?‍??|??‍❤️‍?‍??|??‍❤️‍?‍??|??‍❤️‍?‍??|??‍❤️‍?‍??|??‍❤️‍?‍??|??‍❤️‍?‍??|??‍❤️‍?‍??|??‍❤️‍?‍??|??‍❤️‍?‍??|??‍❤️‍?‍??|??‍❤️‍?‍??|??‍❤️‍?‍??|??‍❤️‍?‍??|??‍❤️‍?‍??|??‍❤️‍?‍??|??‍❤️‍?‍??|??‍❤️‍?‍??|??‍❤️‍?‍??|??‍❤️‍?‍??|??‍❤️‍?‍??|??‍❤️‍?‍??|??‍❤️‍?‍??|??‍❤️‍?‍??|??‍❤️‍?‍??|??‍❤️‍?‍??|??‍❤️‍?‍??|??‍❤️‍?‍??|??‍❤️‍?‍??|??‍❤️‍?‍??|??‍❤️‍?‍??|??‍❤️‍?‍??|??‍❤️‍?‍??|??‍❤️‍?‍??|??‍❤️‍?‍??|??‍❤️‍?‍??|??‍❤️‍?‍??|??‍❤️‍?‍??|??‍❤️‍?‍??|??‍❤️‍?‍??|??‍❤️‍?‍??|??‍❤️‍?‍??|??‍❤️‍?‍??|??‍❤️‍?‍??|??‍❤️‍?‍??|???????|???????|???????|??‍?‍??|??‍?‍??|??‍?‍??|??‍?‍??|??‍?‍??|??‍?‍??|??‍?‍??|??‍?‍??|??‍?‍??|??‍?‍??|??‍?‍??|??‍?‍??|??‍?‍??|??‍?‍??|??‍?‍??|??‍?‍??|??‍?‍??|??‍?‍??|??‍?‍??|??‍?‍??|??‍?‍??|??‍?‍??|??‍?‍??|??‍?‍??|??‍?‍??|??‍?‍??|??‍?‍??|??‍?‍??|??‍?‍??|??‍?‍??|??‍?‍??|??‍?‍??|??‍?‍??|??‍?‍??|??‍?‍??|??‍?‍??|??‍?‍??|??‍?‍??|??‍?‍??|??‍?‍??|??‍?‍??|??‍?‍??|??‍?‍??|??‍?‍??|??‍?‍??|??‍?‍??|??‍?‍??|??‍?‍??|??‍?‍??|??‍?‍??|??‍?‍??|??‍?‍??|??‍?‍??|??‍?‍??|??‍?‍??|??‍?‍??|??‍?‍??|??‍?‍??|??‍?‍??|??‍?‍??|??‍?‍??|??‍?‍??|??‍?‍??|??‍?‍??|??‍?‍??|??‍?‍??|??‍?‍??|??‍?‍??|??‍?‍??|??‍?‍??|??‍?‍??|??‍?‍??|??‍?‍??|??‍?‍??|??‍?‍??|??‍?‍??|??‍?‍??|??‍?‍??|??‍?‍??|??‍?‍??|??‍?‍??|??‍?‍??|??‍?‍??|??‍?‍??|??‍?‍??|??‍❤️‍??|??‍❤️‍??|??‍❤️‍??|??‍❤️‍??|??‍❤️‍??|??‍❤️‍??|??‍❤️‍??|??‍❤️‍??|??‍❤️‍??|??‍❤️‍??|??‍❤️‍??|??‍❤️‍??|??‍❤️‍??|??‍❤️‍??|??‍❤️‍??|??‍❤️‍??|??‍❤️‍??|??‍❤️‍??|??‍❤️‍??|??‍❤️‍??|??‍❤️‍??|??‍❤️‍??|??‍❤️‍??|??‍❤️‍??|??‍❤️‍??|??‍❤️‍??|??‍❤️‍??|??‍❤️‍??|??‍❤️‍??|??‍❤️‍??|??‍❤️‍??|??‍❤️‍??|??‍❤️‍??|??‍❤️‍??|??‍❤️‍??|??‍❤️‍??|??‍❤️‍??|??‍❤️‍??|??‍❤️‍??|??‍❤️‍??|??‍❤️‍??|??‍❤️‍??|??‍❤️‍??|??‍❤️‍??|??‍❤️‍??|??‍❤️‍??|??‍❤️‍??|??‍❤️‍??|??‍❤️‍??|??‍❤️‍??|??‍❤️‍??|??‍❤️‍??|??‍❤️‍??|??‍❤️‍??|??‍❤️‍??|??‍❤️‍??|??‍❤️‍??|??‍❤️‍??|??‍❤️‍??|??‍❤️‍??|??‍❤️‍??|??‍❤️‍??|??‍❤️‍??|??‍❤️‍??|??‍❤️‍??|??‍❤️‍??|??‍❤️‍??|??‍❤️‍??|??‍❤️‍??|??‍❤️‍??|??‍❤️‍??|??‍❤️‍??|??‍❤️‍??|??‍❤️‍??|??‍❤️‍??|??‍❤️‍??|??‍❤️‍??|??‍❤️‍??|??‍❤️‍??|??‍❤️‍??|??‍❤️‍??|??‍❤️‍??|??‍❤️‍??|??‍❤️‍??|??‍❤️‍??|??‍❤️‍??|??‍❤️‍??|??‍❤️‍??|??‍❤️‍??|??‍❤️‍??|??‍❤️‍??|??‍❤️‍??|??‍❤️‍??|??‍❤️‍??|??‍❤️‍??|?‍❤️‍?‍?|?‍❤️‍?‍?|?‍❤️‍?‍?|?‍?‍?‍?|?‍?‍?‍?|?‍?‍?‍?|?‍?‍?‍?|?‍?‍?‍?|?‍?‍?‍?|?‍?‍?‍?|?‍?‍?‍?|?‍?‍?‍?|?‍?‍?|?‍❤️‍?|?‍❤️‍?|?‍❤️‍?|?‍?‍?|?‍?‍?|?‍?‍?|?‍?‍?|?‍?‍?|?‍?‍?|?‍?‍?|?‍?‍?|?‍?‍?|?‍?‍?|?‍?‍?|?‍?‍?|?️‍?️|??‍♂️|??‍♂️|??‍♂️|??‍♂️|??‍♂️|??‍♀️|??‍♀️|??‍♀️|??‍♀️|??‍♀️|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍♀️|??‍♀️|??‍♀️|??‍♀️|??‍♀️|??‍♂️|??‍♂️|??‍♂️|??‍♂️|??‍♂️|??‍♂️|??‍♂️|??‍♂️|??‍♂️|??‍♂️|??‍♀️|??‍♀️|??‍♀️|??‍♀️|??‍♀️|??‍♂️|??‍♂️|??‍♂️|??‍♂️|??‍♂️|??‍♀️|??‍♀️|??‍♀️|??‍♀️|??‍♀️|??‍♂️|??‍♂️|??‍♂️|??‍♂️|??‍♂️|??‍♀️|??‍♀️|??‍♀️|??‍♀️|??‍♀️|??‍♂️|??‍♂️|??‍♂️|??‍♂️|??‍♂️|??‍♀️|??‍♀️|??‍♀️|??‍♀️|??‍♀️|??‍♂️|??‍♂️|??‍♂️|??‍♂️|??‍♂️|??‍♀️|??‍♀️|??‍♀️|??‍♀️|??‍♀️|??‍♂️|??‍♂️|??‍♂️|??‍♂️|??‍♂️|??‍♀️|??‍♀️|??‍♀️|??‍♀️|??‍♀️|??‍♂️|??‍♂️|??‍♂️|??‍♂️|??‍♂️|??‍♀️|??‍♀️|??‍♀️|??‍♀️|??‍♀️|??‍♂️|??‍♂️|??‍♂️|??‍♂️|??‍♂️|??‍♀️|??‍♀️|??‍♀️|??‍♀️|??‍♀️|??‍♂️|??‍♂️|??‍♂️|??‍♂️|??‍♂️|??‍♀️|??‍♀️|??‍♀️|??‍♀️|??‍♀️|??‍♂️|??‍♂️|??‍♂️|??‍♂️|??‍♂️|??‍♀️|??‍♀️|??‍♀️|??‍♀️|??‍♀️|??‍⚕️|??‍⚕️|??‍⚕️|??‍⚕️|??‍⚕️|??‍⚕️|??‍⚕️|??‍⚕️|??‍⚕️|??‍⚕️|??‍⚕️|??‍⚕️|??‍⚕️|??‍⚕️|??‍⚕️|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍⚖️|??‍⚖️|??‍⚖️|??‍⚖️|??‍⚖️|??‍⚖️|??‍⚖️|??‍⚖️|??‍⚖️|??‍⚖️|??‍⚖️|??‍⚖️|??‍⚖️|??‍⚖️|??‍⚖️|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍✈️|??‍✈️|??‍✈️|??‍✈️|??‍✈️|??‍✈️|??‍✈️|??‍✈️|??‍✈️|??‍✈️|??‍✈️|??‍✈️|??‍✈️|??‍✈️|??‍✈️|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍♂️|??‍♂️|??‍♂️|??‍♂️|??‍♂️|??‍♀️|??‍♀️|??‍♀️|??‍♀️|??‍♀️|??‍♂️|??‍♂️|??‍♂️|??‍♂️|??‍♂️|??‍♀️|??‍♀️|??‍♀️|??‍♀️|??‍♀️|??‍♂️|??‍♂️|??‍♂️|??‍♂️|??‍♂️|??‍♀️|??‍♀️|??‍♀️|??‍♀️|??‍♀️|??‍♂️|??‍♂️|??‍♂️|??‍♂️|??‍♂️|??‍♀️|??‍♀️|??‍♀️|??‍♀️|??‍♀️|??‍♂️|??‍♂️|??‍♂️|??‍♂️|??‍♂️|??‍♀️|??‍♀️|??‍♀️|??‍♀️|??‍♀️|??‍♂️|??‍♂️|??‍♂️|??‍♂️|??‍♂️|??‍♀️|??‍♀️|??‍♀️|??‍♀️|??‍♀️|??‍♂️|??‍♂️|??‍♂️|??‍♂️|??‍♂️|??‍♀️|??‍♀️|??‍♀️|??‍♀️|??‍♀️|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍♂️|??‍♂️|??‍♂️|??‍♂️|??‍♂️|??‍♀️|??‍♀️|??‍♀️|??‍♀️|??‍♀️|??‍♂️|??‍♂️|??‍♂️|??‍♂️|??‍♂️|??‍♀️|??‍♀️|??‍♀️|??‍♀️|??‍♀️|??‍♂️|??‍♂️|??‍♂️|??‍♂️|??‍♂️|??‍♀️|??‍♀️|??‍♀️|??‍♀️|??‍♀️|??‍♂️|??‍♂️|??‍♂️|??‍♂️|??‍♂️|??‍♀️|??‍♀️|??‍♀️|??‍♀️|??‍♀️|??‍♂️|??‍♂️|??‍♂️|??‍♂️|??‍♂️|??‍♀️|??‍♀️|??‍♀️|??‍♀️|??‍♀️|??‍♂️|??‍♂️|??‍♂️|??‍♂️|??‍♂️|??‍♀️|??‍♀️|??‍♀️|??‍♀️|??‍♀️|??‍♂️|??‍♂️|??‍♂️|??‍♂️|??‍♂️|??‍♀️|??‍♀️|??‍♀️|??‍♀️|??‍♀️|??‍♂️|??‍♂️|??‍♂️|??‍♂️|??‍♂️|??‍♀️|??‍♀️|??‍♀️|??‍♀️|??‍♀️|??‍♂️|??‍♂️|??‍♂️|??‍♂️|??‍♂️|??‍♀️|??‍♀️|??‍♀️|??‍♀️|??‍♀️|??‍♂️|??‍♂️|??‍♂️|??‍♂️|??‍♂️|??‍♀️|??‍♀️|??‍♀️|??‍♀️|??‍♀️|??‍♂️|??‍♂️|??‍♂️|??‍♂️|??‍♂️|??‍♀️|??‍♀️|??‍♀️|??‍♀️|??‍♀️|??‍♂️|??‍♂️|??‍♂️|??‍♂️|??‍♂️|??‍♀️|??‍♀️|??‍♀️|??‍♀️|??‍♀️|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍?|??‍♂️|??‍♂️|??‍♂️|??‍♂️|??‍♂️|??‍♀️|??‍♀️|??‍♀️|??‍♀️|??‍♀️|??‍♂️|??‍♂️|??‍♂️|??‍♂️|??‍♂️|??‍♀️|??‍♀️|??‍♀️|??‍♀️|??‍♀️|??‍♂️|??‍♂️|??‍♂️|??‍♂️|??‍♂️|??‍♀️|??‍♀️|??‍♀️|??‍♀️|??‍♀️|??‍♂️|??‍♂️|??‍♂️|??‍♂️|??‍♂️|??‍♀️|??‍♀️|??‍♀️|??‍♀️|??‍♀️|??‍♂️|??‍♂️|??‍♂️|??‍♂️|??‍♂️|??‍♀️|??‍♀️|??‍♀️|??‍♀️|??‍♀️|??‍♂️|??‍♂️|??‍♂️|??‍♂️|??‍♂️|??‍♀️|??‍♀️|??‍♀️|??‍♀️|??‍♀️|??‍♂️|??‍♂️|??‍♂️|??‍♂️|??‍♂️|??‍♀️|??‍♀️|??‍♀️|??‍♀️|??‍♀️|??‍♂️|??‍♂️|??‍♂️|??‍♂️|??‍♂️|??‍♀️|??‍♀️|??‍♀️|??‍♀️|??‍♀️|??‍♂️|??‍♂️|??‍♂️|??‍♂️|??‍♂️|??‍♀️|??‍♀️|??‍♀️|??‍♀️|??‍♀️|??‍♂️|??‍♂️|??‍♂️|??‍♂️|??‍♂️|??‍♀️|??‍♀️|??‍♀️|??‍♀️|??‍♀️|??‍♂️|??‍♂️|??‍♂️|??‍♂️|??‍♂️|??‍♀️|??‍♀️|??‍♀️|??‍♀️|??‍♀️|??‍♂️|??‍♂️|??‍♂️|??‍♂️|??‍♂️|??‍♀️|??‍♀️|??‍♀️|??‍♀️|??‍♀️|??‍♂️|??‍♂️|??‍♂️|??‍♂️|??‍♂️|??‍♀️|??‍♀️|??‍♀️|??‍♀️|??‍♀️|??‍♂️|??‍♂️|??‍♂️|??‍♂️|??‍♂️|??‍♀️|??‍♀️|??‍♀️|??‍♀️|??‍♀️|??‍♂️|??‍♂️|??‍♂️|??‍♂️|??‍♂️|??‍♀️|??‍♀️|??‍♀️|??‍♀️|??‍♀️|?‍?️|?️‍♂️|?️‍♀️|?️‍♂️|?️‍♀️|?️‍♂️|?️‍♀️|?️‍?|?️‍⚧️|⛹?‍♂️|⛹?‍♂️|⛹?‍♂️|⛹?‍♂️|⛹?‍♂️|⛹?‍♀️|⛹?‍♀️|⛹?‍♀️|⛹?‍♀️|⛹?‍♀️|?‍?|?‍?|❤️‍?|❤️‍?|?‍♂️|?‍♀️|?‍?|?‍?|?‍?|?‍?|?‍?|?‍?|?‍?|?‍?|?‍?|?‍?|?‍?|?‍?|?‍♀️|?‍♂️|?‍♂️|?‍♀️|?‍♂️|?‍♀️|?‍♂️|?‍♀️|?‍♂️|?‍♀️|?‍♂️|?‍♀️|?‍♂️|?‍♀️|?‍♂️|?‍♀️|?‍♂️|?‍♀️|?‍♂️|?‍♀️|?‍♂️|?‍♀️|?‍⚕️|?‍⚕️|?‍⚕️|?‍?|?‍?|?‍?|?‍?|?‍?|?‍?|?‍⚖️|?‍⚖️|?‍⚖️|?‍?|?‍?|?‍?|?‍?|?‍?|?‍?|?‍?|?‍?|?‍?|?‍?|?‍?|?‍?|?‍?|?‍?|?‍?|?‍?|?‍?|?‍?|?‍?|?‍?|?‍?|?‍?|?‍?|?‍?|?‍?|?‍?|?‍?|?‍✈️|?‍✈️|?‍✈️|?‍?|?‍?|?‍?|?‍?|?‍?|?‍?|?‍♂️|?‍♀️|?‍♂️|?‍♀️|?‍♂️|?‍♀️|?‍♂️|?‍♀️|?‍♂️|?‍♀️|?‍♂️|?‍♀️|?‍?|?‍?|?‍?|?‍?|?‍♂️|?‍♀️|?‍♂️|?‍♀️|?‍♂️|?‍♀️|?‍♂️|?‍♀️|?‍♂️|?‍♀️|?‍♂️|?‍♀️|?‍♂️|?‍♀️|?‍♂️|?‍♀️|?‍♂️|?‍♀️|?‍♂️|?‍♀️|?‍♂️|?‍♀️|?‍♂️|?‍♀️|?‍♂️|?‍♀️|?‍♂️|?‍♀️|?‍?|?‍?|?‍?|?‍?|?‍?|?‍?|?‍?|?‍?|?‍?|?‍♂️|?‍♀️|?‍♂️|?‍♀️|?‍♂️|?‍♀️|?‍♂️|?‍♀️|?‍♂️|?‍♀️|?‍♂️|?‍♀️|?‍♂️|?‍♀️|⛹️‍♂️|⛹️‍♀️|?‍♂️|?‍♀️|?‍♂️|?‍♀️|?‍♂️|?‍♀️|?‍♂️|?‍♀️|?‍♂️|?‍♀️|?‍♂️|?‍♀️|?‍♂️|?‍♀️|?‍♂️|?‍♀️|?‍?|?‍?|?‍?|?‍?|?‍?|?‍❄️|?‍☠️|?‍⬛|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|??|#️⃣|0️⃣|1️⃣|2️⃣|3️⃣|4️⃣|5️⃣|6️⃣|7️⃣|8️⃣|9️⃣|✋?|✋?|✋?|✋?|✋?|✌?|✌?|✌?|✌?|✌?|☝?|☝?|☝?|☝?|☝?|✊?|✊?|✊?|✊?|✊?|✍?|✍?|✍?|✍?|✍?|⛹?|⛹?|⛹?|⛹?|⛹?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|?|☺|☹|☠|❣|❤|✋|✌|☝|✊|✍|⛷|⛹|☘|☕|⛰|⛪|⛩|⛲|⛺|♨|⛽|⚓|⛵|⛴|✈|⌛|⏳|⌚|⏰|⏱|⏲|☀|⭐|☁|⛅|⛈|☂|☔|⛱|⚡|❄|☃|⛄|☄|✨|⚽|⚾|⛳|⛸|♠|♥|♦|♣|♟|⛑|☎|⌨|✉|✏|✒|✂|⛏|⚒|⚔|⚙|⚖|⛓|⚗|⚰|⚱|♿|⚠|⛔|☢|☣|⬆|↗|➡|↘|⬇|↙|⬅|↖|↕|↔|↩|↪|⤴|⤵|⚛|✡|☸|☯|✝|☦|☪|☮|♈|♉|♊|♋|♌|♍|♎|♏|♐|♑|♒|♓|⛎|▶|⏩|⏭|⏯|◀|⏪|⏮|⏫|⏬|⏸|⏹|⏺|⏏|♀|♂|⚧|✖|➕|➖|➗|♾|‼|⁉|❓|❔|❕|❗|〰|⚕|♻|⚜|⭕|✅|☑|✔|❌|❎|➰|➿|〽|✳|✴|❇|©|®|™|ℹ|Ⓜ|㊗|㊙|⚫|⚪|⬛|⬜|◼|◻|◾|◽|▪|▫)";
                  var i = 0;
                  Pattern r = Pattern.compile(pattern);
                  Matcher m = r.matcher(line);
                  while(m.find( )) {
                     i++;
                     System.out.println("Found value: " + m.group(0) );
                  }
                  System.out.println("Found " + i + " emojis." );
                }
            }
            

            更多信息:

            https://github.com/sweaver2112/Regex-combined-emojis

            Regex 101 Demo (compact, unsafe literal emoji version)

            Regex 101 Demo (long, safe unicode escape version)

            【讨论】:

              【解决方案14】:

              只要规范发生变化,您就可以生成自己的正则表达式。
              这个工具(截图here)。

              对于 utf-8/32 模式(字符串),扩展模式:

              "     # Use the 'Mega-Conversion' tool to change into other syntaxes"
              "     # -------------------------------------------------------------"
              "     "
              "     [#*0-9] \\x{FE0F} \\x{20E3}"
              "  |  [\\x{A9}\\x{AE}\\x{203C}\\x{2049}\\x{2122}\\x{2139}\\x{2194}-\\x{2199}\\x{21A9}\\x{21AA}\\x{231A}\\x{231B}\\x{2328}\\x{23CF}\\x{23E9}-\\x{23F3}\\x{23F8}-\\x{23FA}\\x{24C2}\\x{25AA}\\x{25AB}\\x{25B6}\\x{25C0}\\x{25FB}-\\x{25FE}\\x{2600}-\\x{2604}\\x{260E}\\x{2611}\\x{2614}\\x{2615}\\x{2618}]"
              "  |  \\x{261D} [\\x{1F3FB}-\\x{1F3FF}]?"
              "  |  [\\x{2620}\\x{2622}\\x{2623}\\x{2626}\\x{262A}\\x{262E}\\x{262F}\\x{2638}-\\x{263A}\\x{2640}\\x{2642}\\x{2648}-\\x{2653}\\x{265F}\\x{2660}\\x{2663}\\x{2665}\\x{2666}\\x{2668}\\x{267B}\\x{267E}\\x{267F}\\x{2692}-\\x{2697}\\x{2699}\\x{269B}\\x{269C}\\x{26A0}\\x{26A1}\\x{26AA}\\x{26AB}\\x{26B0}\\x{26B1}\\x{26BD}\\x{26BE}\\x{26C4}\\x{26C5}\\x{26C8}\\x{26CE}\\x{26CF}\\x{26D1}\\x{26D3}\\x{26D4}\\x{26E9}\\x{26EA}\\x{26F0}-\\x{26F5}\\x{26F7}\\x{26F8}]"
              "  |  \\x{26F9}"
              "     (?:"
              "          \\x{FE0F} \\x{200D} [\\x{2640}\\x{2642}] \\x{FE0F}"
              "       |  [\\x{1F3FB}-\\x{1F3FF}]"
              "          (?: \\x{200D} [\\x{2640}\\x{2642}] \\x{FE0F} )?"
              "     )?"
              "  |  [\\x{26FA}\\x{26FD}\\x{2702}\\x{2705}\\x{2708}\\x{2709}]"
              "  |  [\\x{270A}-\\x{270D}] [\\x{1F3FB}-\\x{1F3FF}]?"
              "  |  [\\x{270F}\\x{2712}\\x{2714}\\x{2716}\\x{271D}\\x{2721}\\x{2728}\\x{2733}\\x{2734}\\x{2744}\\x{2747}\\x{274C}\\x{274E}\\x{2753}-\\x{2755}\\x{2757}\\x{2763}\\x{2764}\\x{2795}-\\x{2797}\\x{27A1}\\x{27B0}\\x{27BF}\\x{2934}\\x{2935}\\x{2B05}-\\x{2B07}\\x{2B1B}\\x{2B1C}\\x{2B50}\\x{2B55}\\x{3030}\\x{303D}\\x{3297}\\x{3299}\\x{1F004}\\x{1F0CF}\\x{1F170}\\x{1F171}\\x{1F17E}\\x{1F17F}\\x{1F18E}\\x{1F191}-\\x{1F19A}]"
              "  |  \\x{1F1E6} [\\x{1F1E8}-\\x{1F1EC}\\x{1F1EE}\\x{1F1F1}\\x{1F1F2}\\x{1F1F4}\\x{1F1F6}-\\x{1F1FA}\\x{1F1FC}\\x{1F1FD}\\x{1F1FF}]"
              "  |  \\x{1F1E7} [\\x{1F1E6}\\x{1F1E7}\\x{1F1E9}-\\x{1F1EF}\\x{1F1F1}-\\x{1F1F4}\\x{1F1F6}-\\x{1F1F9}\\x{1F1FB}\\x{1F1FC}\\x{1F1FE}\\x{1F1FF}]"
              "  |  \\x{1F1E8} [\\x{1F1E6}\\x{1F1E8}\\x{1F1E9}\\x{1F1EB}-\\x{1F1EE}\\x{1F1F0}-\\x{1F1F5}\\x{1F1F7}\\x{1F1FA}-\\x{1F1FF}]"
              "  |  \\x{1F1E9} [\\x{1F1EA}\\x{1F1EC}\\x{1F1EF}\\x{1F1F0}\\x{1F1F2}\\x{1F1F4}\\x{1F1FF}]"
              "  |  \\x{1F1EA} [\\x{1F1E6}\\x{1F1E8}\\x{1F1EA}\\x{1F1EC}\\x{1F1ED}\\x{1F1F7}-\\x{1F1FA}]"
              "  |  \\x{1F1EB} [\\x{1F1EE}-\\x{1F1F0}\\x{1F1F2}\\x{1F1F4}\\x{1F1F7}]"
              "  |  \\x{1F1EC} [\\x{1F1E6}\\x{1F1E7}\\x{1F1E9}-\\x{1F1EE}\\x{1F1F1}-\\x{1F1F3}\\x{1F1F5}-\\x{1F1FA}\\x{1F1FC}\\x{1F1FE}]"
              "  |  \\x{1F1ED} [\\x{1F1F0}\\x{1F1F2}\\x{1F1F3}\\x{1F1F7}\\x{1F1F9}\\x{1F1FA}]"
              "  |  \\x{1F1EE} [\\x{1F1E8}-\\x{1F1EA}\\x{1F1F1}-\\x{1F1F4}\\x{1F1F6}-\\x{1F1F9}]"
              "  |  \\x{1F1EF} [\\x{1F1EA}\\x{1F1F2}\\x{1F1F4}\\x{1F1F5}]"
              "  |  \\x{1F1F0} [\\x{1F1EA}\\x{1F1EC}-\\x{1F1EE}\\x{1F1F2}\\x{1F1F3}\\x{1F1F5}\\x{1F1F7}\\x{1F1FC}\\x{1F1FE}\\x{1F1FF}]"
              "  |  \\x{1F1F1} [\\x{1F1E6}-\\x{1F1E8}\\x{1F1EE}\\x{1F1F0}\\x{1F1F7}-\\x{1F1FB}\\x{1F1FE}]"
              "  |  \\x{1F1F2} [\\x{1F1E6}\\x{1F1E8}-\\x{1F1ED}\\x{1F1F0}-\\x{1F1FF}]"
              "  |  \\x{1F1F3} [\\x{1F1E6}\\x{1F1E8}\\x{1F1EA}-\\x{1F1EC}\\x{1F1EE}\\x{1F1F1}\\x{1F1F4}\\x{1F1F5}\\x{1F1F7}\\x{1F1FA}\\x{1F1FF}]"
              "  |  \\x{1F1F4} \\x{1F1F2}"
              "  |  \\x{1F1F5} [\\x{1F1E6}\\x{1F1EA}-\\x{1F1ED}\\x{1F1F0}-\\x{1F1F3}\\x{1F1F7}-\\x{1F1F9}\\x{1F1FC}\\x{1F1FE}]"
              "  |  \\x{1F1F6} \\x{1F1E6}"
              "  |  \\x{1F1F7} [\\x{1F1EA}\\x{1F1F4}\\x{1F1F8}\\x{1F1FA}\\x{1F1FC}]"
              "  |  \\x{1F1F8} [\\x{1F1E6}-\\x{1F1EA}\\x{1F1EC}-\\x{1F1F4}\\x{1F1F7}-\\x{1F1F9}\\x{1F1FB}\\x{1F1FD}-\\x{1F1FF}]"
              "  |  \\x{1F1F9} [\\x{1F1E6}\\x{1F1E8}\\x{1F1E9}\\x{1F1EB}-\\x{1F1ED}\\x{1F1EF}-\\x{1F1F4}\\x{1F1F7}\\x{1F1F9}\\x{1F1FB}\\x{1F1FC}\\x{1F1FF}]"
              "  |  \\x{1F1FA} [\\x{1F1E6}\\x{1F1EC}\\x{1F1F2}\\x{1F1F3}\\x{1F1F8}\\x{1F1FE}\\x{1F1FF}]"
              "  |  \\x{1F1FB} [\\x{1F1E6}\\x{1F1E8}\\x{1F1EA}\\x{1F1EC}\\x{1F1EE}\\x{1F1F3}\\x{1F1FA}]"
              "  |  \\x{1F1FC} [\\x{1F1EB}\\x{1F1F8}]"
              "  |  \\x{1F1FD} \\x{1F1F0}"
              "  |  \\x{1F1FE} [\\x{1F1EA}\\x{1F1F9}]"
              "  |  \\x{1F1FF} [\\x{1F1E6}\\x{1F1F2}\\x{1F1FC}]"
              "  |  [\\x{1F201}\\x{1F202}\\x{1F21A}\\x{1F22F}\\x{1F232}-\\x{1F23A}\\x{1F250}\\x{1F251}\\x{1F300}-\\x{1F321}\\x{1F324}-\\x{1F384}]"
              "  |  \\x{1F385} [\\x{1F3FB}-\\x{1F3FF}]?"
              "  |  [\\x{1F386}-\\x{1F393}\\x{1F396}\\x{1F397}\\x{1F399}-\\x{1F39B}\\x{1F39E}-\\x{1F3C1}]"
              "  |  \\x{1F3C2} [\\x{1F3FB}-\\x{1F3FF}]?"
              "  |  [\\x{1F3C3}\\x{1F3C4}]"
              "     (?:"
              "          \\x{200D} [\\x{2640}\\x{2642}] \\x{FE0F}"
              "       |  [\\x{1F3FB}-\\x{1F3FF}]"
              "          (?: \\x{200D} [\\x{2640}\\x{2642}] \\x{FE0F} )?"
              "     )?"
              "  |  [\\x{1F3C5}\\x{1F3C6}]"
              "  |  \\x{1F3C7} [\\x{1F3FB}-\\x{1F3FF}]?"
              "  |  [\\x{1F3C8}\\x{1F3C9}]"
              "  |  \\x{1F3CA}"
              "     (?:"
              "          \\x{200D} [\\x{2640}\\x{2642}] \\x{FE0F}"
              "       |  [\\x{1F3FB}-\\x{1F3FF}]"
              "          (?: \\x{200D} [\\x{2640}\\x{2642}] \\x{FE0F} )?"
              "     )?"
              "  |  [\\x{1F3CB}\\x{1F3CC}]"
              "     (?:"
              "          \\x{FE0F} \\x{200D} [\\x{2640}\\x{2642}] \\x{FE0F}"
              "       |  [\\x{1F3FB}-\\x{1F3FF}]"
              "          (?: \\x{200D} [\\x{2640}\\x{2642}] \\x{FE0F} )?"
              "     )?"
              "  |  [\\x{1F3CD}-\\x{1F3F0}]"
              "  |  \\x{1F3F3}"
              "     (?: \\x{FE0F} \\x{200D} \\x{1F308} )?"
              "  |  \\x{1F3F4}"
              "     (?:"
              "          \\x{200D} \\x{2620} \\x{FE0F}"
              "       |  \\x{E0067} \\x{E0062}"
              "          (?:"
              "               \\x{E0065} \\x{E006E} \\x{E0067}"
              "            |  \\x{E0073} \\x{E0063} \\x{E0074}"
              "            |  \\x{E0077} \\x{E006C} \\x{E0073}"
              "          )"
              "          \\x{E007F}"
              "     )?"
              "  |  [\\x{1F3F5}\\x{1F3F7}-\\x{1F440}]"
              "  |  \\x{1F441}"
              "     (?: \\x{FE0F} \\x{200D} \\x{1F5E8} \\x{FE0F} )?"
              "  |  [\\x{1F442}\\x{1F443}] [\\x{1F3FB}-\\x{1F3FF}]?"
              "  |  [\\x{1F444}\\x{1F445}]"
              "  |  [\\x{1F446}-\\x{1F450}] [\\x{1F3FB}-\\x{1F3FF}]?"
              "  |  [\\x{1F451}-\\x{1F465}]"
              "  |  [\\x{1F466}\\x{1F467}] [\\x{1F3FB}-\\x{1F3FF}]?"
              "  |  \\x{1F468}"
              "     (?:"
              "          \\x{200D}"
              "          (?:"
              "               [\\x{2695}\\x{2696}\\x{2708}] \\x{FE0F}"
              "            |  \\x{2764} \\x{FE0F} \\x{200D}"
              "               (?: \\x{1F48B} \\x{200D} )?"
              "               \\x{1F468}"
              "            |  [\\x{1F33E}\\x{1F373}\\x{1F393}\\x{1F3A4}\\x{1F3A8}\\x{1F3EB}\\x{1F3ED}]"
              "            |  \\x{1F466}"
              "               (?: \\x{200D} \\x{1F466} )?"
              "            |  \\x{1F467}"
              "               (?: \\x{200D} [\\x{1F466}\\x{1F467}] )?"
              "            |  [\\x{1F468}\\x{1F469}] \\x{200D}"
              "               (?:"
              "                    \\x{1F466}"
              "                    (?: \\x{200D} \\x{1F466} )?"
              "                 |  \\x{1F467}"
              "                    (?: \\x{200D} [\\x{1F466}\\x{1F467}] )?"
              "               )"
              "            |  [\\x{1F4BB}\\x{1F4BC}\\x{1F527}\\x{1F52C}\\x{1F680}\\x{1F692}\\x{1F9B0}-\\x{1F9B3}]"
              "          )"
              "       |  [\\x{1F3FB}-\\x{1F3FF}]"
              "          (?:"
              "               \\x{200D}"
              "               (?:"
              "                    [\\x{2695}\\x{2696}\\x{2708}] \\x{FE0F}"
              "                 |  [\\x{1F33E}\\x{1F373}\\x{1F393}\\x{1F3A4}\\x{1F3A8}\\x{1F3EB}\\x{1F3ED}\\x{1F4BB}\\x{1F4BC}\\x{1F527}\\x{1F52C}\\x{1F680}\\x{1F692}\\x{1F9B0}-\\x{1F9B3}]"
              "               )"
              "          )?"
              "     )?"
              "  |  \\x{1F469}"
              "     (?:"
              "          \\x{200D}"
              "          (?:"
              "               [\\x{2695}\\x{2696}\\x{2708}] \\x{FE0F}"
              "            |  \\x{2764} \\x{FE0F} \\x{200D}"
              "               (?: \\x{1F48B} \\x{200D} )?"
              "               [\\x{1F468}\\x{1F469}]"
              "            |  [\\x{1F33E}\\x{1F373}\\x{1F393}\\x{1F3A4}\\x{1F3A8}\\x{1F3EB}\\x{1F3ED}]"
              "            |  \\x{1F466}"
              "               (?: \\x{200D} \\x{1F466} )?"
              "            |  \\x{1F467}"
              "               (?: \\x{200D} [\\x{1F466}\\x{1F467}] )?"
              "            |  \\x{1F469} \\x{200D}"
              "               (?:"
              "                    \\x{1F466}"
              "                    (?: \\x{200D} \\x{1F466} )?"
              "                 |  \\x{1F467}"
              "                    (?: \\x{200D} [\\x{1F466}\\x{1F467}] )?"
              "               )"
              "            |  [\\x{1F4BB}\\x{1F4BC}\\x{1F527}\\x{1F52C}\\x{1F680}\\x{1F692}\\x{1F9B0}-\\x{1F9B3}]"
              "          )"
              "       |  [\\x{1F3FB}-\\x{1F3FF}]"
              "          (?:"
              "               \\x{200D}"
              "               (?:"
              "                    [\\x{2695}\\x{2696}\\x{2708}] \\x{FE0F}"
              "                 |  [\\x{1F33E}\\x{1F373}\\x{1F393}\\x{1F3A4}\\x{1F3A8}\\x{1F3EB}\\x{1F3ED}\\x{1F4BB}\\x{1F4BC}\\x{1F527}\\x{1F52C}\\x{1F680}\\x{1F692}\\x{1F9B0}-\\x{1F9B3}]"
              "               )"
              "          )?"
              "     )?"
              "  |  [\\x{1F46A}-\\x{1F46D}]"
              "  |  \\x{1F46E}"
              "     (?:"
              "          \\x{200D} [\\x{2640}\\x{2642}] \\x{FE0F}"
              "       |  [\\x{1F3FB}-\\x{1F3FF}]"
              "          (?: \\x{200D} [\\x{2640}\\x{2642}] \\x{FE0F} )?"
              "     )?"
              "  |  \\x{1F46F}"
              "     (?: \\x{200D} [\\x{2640}\\x{2642}] \\x{FE0F} )?"
              "  |  \\x{1F470} [\\x{1F3FB}-\\x{1F3FF}]?"
              "  |  \\x{1F471}"
              "     (?:"
              "          \\x{200D} [\\x{2640}\\x{2642}] \\x{FE0F}"
              "       |  [\\x{1F3FB}-\\x{1F3FF}]"
              "          (?: \\x{200D} [\\x{2640}\\x{2642}] \\x{FE0F} )?"
              "     )?"
              "  |  \\x{1F472} [\\x{1F3FB}-\\x{1F3FF}]?"
              "  |  \\x{1F473}"
              "     (?:"
              "          \\x{200D} [\\x{2640}\\x{2642}] \\x{FE0F}"
              "       |  [\\x{1F3FB}-\\x{1F3FF}]"
              "          (?: \\x{200D} [\\x{2640}\\x{2642}] \\x{FE0F} )?"
              "     )?"
              "  |  [\\x{1F474}-\\x{1F476}] [\\x{1F3FB}-\\x{1F3FF}]?"
              "  |  \\x{1F477}"
              "     (?:"
              "          \\x{200D} [\\x{2640}\\x{2642}] \\x{FE0F}"
              "       |  [\\x{1F3FB}-\\x{1F3FF}]"
              "          (?: \\x{200D} [\\x{2640}\\x{2642}] \\x{FE0F} )?"
              "     )?"
              "  |  \\x{1F478} [\\x{1F3FB}-\\x{1F3FF}]?"
              "  |  [\\x{1F479}-\\x{1F47B}]"
              "  |  \\x{1F47C} [\\x{1F3FB}-\\x{1F3FF}]?"
              "  |  [\\x{1F47D}-\\x{1F480}]"
              "  |  [\\x{1F481}\\x{1F482}]"
              "     (?:"
              "          \\x{200D} [\\x{2640}\\x{2642}] \\x{FE0F}"
              "       |  [\\x{1F3FB}-\\x{1F3FF}]"
              "          (?: \\x{200D} [\\x{2640}\\x{2642}] \\x{FE0F} )?"
              "     )?"
              "  |  \\x{1F483} [\\x{1F3FB}-\\x{1F3FF}]?"
              "  |  \\x{1F484}"
              "  |  \\x{1F485} [\\x{1F3FB}-\\x{1F3FF}]?"
              "  |  [\\x{1F486}\\x{1F487}]"
              "     (?:"
              "          \\x{200D} [\\x{2640}\\x{2642}] \\x{FE0F}"
              "       |  [\\x{1F3FB}-\\x{1F3FF}]"
              "          (?: \\x{200D} [\\x{2640}\\x{2642}] \\x{FE0F} )?"
              "     )?"
              "  |  [\\x{1F488}-\\x{1F4A9}]"
              "  |  \\x{1F4AA} [\\x{1F3FB}-\\x{1F3FF}]?"
              "  |  [\\x{1F4AB}-\\x{1F4FD}\\x{1F4FF}-\\x{1F53D}\\x{1F549}-\\x{1F54E}\\x{1F550}-\\x{1F567}\\x{1F56F}\\x{1F570}\\x{1F573}]"
              "  |  \\x{1F574} [\\x{1F3FB}-\\x{1F3FF}]?"
              "  |  \\x{1F575}"
              "     (?:"
              "          \\x{FE0F} \\x{200D} [\\x{2640}\\x{2642}] \\x{FE0F}"
              "       |  [\\x{1F3FB}-\\x{1F3FF}]"
              "          (?: \\x{200D} [\\x{2640}\\x{2642}] \\x{FE0F} )?"
              "     )?"
              "  |  [\\x{1F576}-\\x{1F579}]"
              "  |  \\x{1F57A} [\\x{1F3FB}-\\x{1F3FF}]?"
              "  |  [\\x{1F587}\\x{1F58A}-\\x{1F58D}]"
              "  |  [\\x{1F590}\\x{1F595}\\x{1F596}] [\\x{1F3FB}-\\x{1F3FF}]?"
              "  |  [\\x{1F5A4}\\x{1F5A5}\\x{1F5A8}\\x{1F5B1}\\x{1F5B2}\\x{1F5BC}\\x{1F5C2}-\\x{1F5C4}\\x{1F5D1}-\\x{1F5D3}\\x{1F5DC}-\\x{1F5DE}\\x{1F5E1}\\x{1F5E3}\\x{1F5E8}\\x{1F5EF}\\x{1F5F3}\\x{1F5FA}-\\x{1F644}]"
              "  |  [\\x{1F645}-\\x{1F647}]"
              "     (?:"
              "          \\x{200D} [\\x{2640}\\x{2642}] \\x{FE0F}"
              "       |  [\\x{1F3FB}-\\x{1F3FF}]"
              "          (?: \\x{200D} [\\x{2640}\\x{2642}] \\x{FE0F} )?"
              "     )?"
              "  |  [\\x{1F648}-\\x{1F64A}]"
              "  |  \\x{1F64B}"
              "     (?:"
              "          \\x{200D} [\\x{2640}\\x{2642}] \\x{FE0F}"
              "       |  [\\x{1F3FB}-\\x{1F3FF}]"
              "          (?: \\x{200D} [\\x{2640}\\x{2642}] \\x{FE0F} )?"
              "     )?"
              "  |  \\x{1F64C} [\\x{1F3FB}-\\x{1F3FF}]?"
              "  |  [\\x{1F64D}\\x{1F64E}]"
              "     (?:"
              "          \\x{200D} [\\x{2640}\\x{2642}] \\x{FE0F}"
              "       |  [\\x{1F3FB}-\\x{1F3FF}]"
              "          (?: \\x{200D} [\\x{2640}\\x{2642}] \\x{FE0F} )?"
              "     )?"
              "  |  \\x{1F64F} [\\x{1F3FB}-\\x{1F3FF}]?"
              "  |  [\\x{1F680}-\\x{1F6A2}]"
              "  |  \\x{1F6A3}"
              "     (?:"
              "          \\x{200D} [\\x{2640}\\x{2642}] \\x{FE0F}"
              "       |  [\\x{1F3FB}-\\x{1F3FF}]"
              "          (?: \\x{200D} [\\x{2640}\\x{2642}] \\x{FE0F} )?"
              "     )?"
              "  |  [\\x{1F6A4}-\\x{1F6B3}]"
              "  |  [\\x{1F6B4}-\\x{1F6B6}]"
              "     (?:"
              "          \\x{200D} [\\x{2640}\\x{2642}] \\x{FE0F}"
              "       |  [\\x{1F3FB}-\\x{1F3FF}]"
              "          (?: \\x{200D} [\\x{2640}\\x{2642}] \\x{FE0F} )?"
              "     )?"
              "  |  [\\x{1F6B7}-\\x{1F6BF}]"
              "  |  \\x{1F6C0} [\\x{1F3FB}-\\x{1F3FF}]?"
              "  |  [\\x{1F6C1}-\\x{1F6C5}\\x{1F6CB}]"
              "  |  \\x{1F6CC} [\\x{1F3FB}-\\x{1F3FF}]?"
              "  |  [\\x{1F6CD}-\\x{1F6D2}\\x{1F6E0}-\\x{1F6E5}\\x{1F6E9}\\x{1F6EB}\\x{1F6EC}\\x{1F6F0}\\x{1F6F3}-\\x{1F6F9}\\x{1F910}-\\x{1F917}]"
              "  |  [\\x{1F918}-\\x{1F91C}] [\\x{1F3FB}-\\x{1F3FF}]?"
              "  |  \\x{1F91D}"
              "  |  [\\x{1F91E}\\x{1F91F}] [\\x{1F3FB}-\\x{1F3FF}]?"
              "  |  [\\x{1F920}-\\x{1F925}]"
              "  |  \\x{1F926}"
              "     (?:"
              "          \\x{200D} [\\x{2640}\\x{2642}] \\x{FE0F}"
              "       |  [\\x{1F3FB}-\\x{1F3FF}]"
              "          (?: \\x{200D} [\\x{2640}\\x{2642}] \\x{FE0F} )?"
              "     )?"
              "  |  [\\x{1F927}-\\x{1F92F}]"
              "  |  [\\x{1F930}-\\x{1F936}] [\\x{1F3FB}-\\x{1F3FF}]?"
              "  |  \\x{1F937}"
              "     (?:"
              "          \\x{200D} [\\x{2640}\\x{2642}] \\x{FE0F}"
              "       |  [\\x{1F3FB}-\\x{1F3FF}]"
              "          (?: \\x{200D} [\\x{2640}\\x{2642}] \\x{FE0F} )?"
              "     )?"
              "  |  [\\x{1F938}\\x{1F939}]"
              "     (?:"
              "          \\x{200D} [\\x{2640}\\x{2642}] \\x{FE0F}"
              "       |  [\\x{1F3FB}-\\x{1F3FF}]"
              "          (?: \\x{200D} [\\x{2640}\\x{2642}] \\x{FE0F} )?"
              "     )?"
              "  |  \\x{1F93A}"
              "  |  \\x{1F93C}"
              "     (?: \\x{200D} [\\x{2640}\\x{2642}] \\x{FE0F} )?"
              "  |  [\\x{1F93D}\\x{1F93E}]"
              "     (?:"
              "          \\x{200D} [\\x{2640}\\x{2642}] \\x{FE0F}"
              "       |  [\\x{1F3FB}-\\x{1F3FF}]"
              "          (?: \\x{200D} [\\x{2640}\\x{2642}] \\x{FE0F} )?"
              "     )?"
              "  |  [\\x{1F940}-\\x{1F945}\\x{1F947}-\\x{1F970}\\x{1F973}-\\x{1F976}\\x{1F97A}\\x{1F97C}-\\x{1F9A2}\\x{1F9B0}-\\x{1F9B4}]"
              "  |  [\\x{1F9B5}\\x{1F9B6}] [\\x{1F3FB}-\\x{1F3FF}]?"
              "  |  \\x{1F9B7}"
              "  |  [\\x{1F9B8}\\x{1F9B9}]"
              "     (?:"
              "          \\x{200D} [\\x{2640}\\x{2642}] \\x{FE0F}"
              "       |  [\\x{1F3FB}-\\x{1F3FF}]"
              "          (?: \\x{200D} [\\x{2640}\\x{2642}] \\x{FE0F} )?"
              "     )?"
              "  |  [\\x{1F9C0}-\\x{1F9C2}\\x{1F9D0}]"
              "  |  [\\x{1F9D1}-\\x{1F9D5}] [\\x{1F3FB}-\\x{1F3FF}]?"
              "  |  \\x{1F9D6}"
              "     (?:"
              "          \\x{200D} [\\x{2640}\\x{2642}] \\x{FE0F}"
              "       |  [\\x{1F3FB}-\\x{1F3FF}]"
              "          (?: \\x{200D} [\\x{2640}\\x{2642}] \\x{FE0F} )?"
              "     )?"
              "  |  [\\x{1F9D7}-\\x{1F9DD}]"
              "     (?:"
              "          \\x{200D} [\\x{2640}\\x{2642}] \\x{FE0F}"
              "       |  [\\x{1F3FB}-\\x{1F3FF}]"
              "          (?: \\x{200D} [\\x{2640}\\x{2642}] \\x{FE0F} )?"
              "     )?"
              "  |  [\\x{1F9DE}\\x{1F9DF}]"
              "     (?: \\x{200D} [\\x{2640}\\x{2642}] \\x{FE0F} )?"
              "  |  [\\x{1F9E0}-\\x{1F9FF}]"
              

              对于 utf-16 模式(字符串),压缩模式:

              "[#*0-9]\\uFE0F\\u20E3|[\\u00A9\\u00AE\\u203C\\u2049\\u2122\\u2139\\u2"
              "194-\\u2199\\u21A9\\u21AA\\u231A\\u231B\\u2328\\u23CF\\u23E9-\\u23F3\\"
              "u23F8-\\u23FA\\u24C2\\u25AA\\u25AB\\u25B6\\u25C0\\u25FB-\\u25FE\\u260"
              "0-\\u2604\\u260E\\u2611\\u2614\\u2615\\u2618]|\\u261D(?:\\uD83C[\\uDF"
              "FB-\\uDFFF])?|[\\u2620\\u2622\\u2623\\u2626\\u262A\\u262E\\u262F\\u26"
              "38-\\u263A\\u2640\\u2642\\u2648-\\u2653\\u265F\\u2660\\u2663\\u2665\\u"
              "2666\\u2668\\u267B\\u267E\\u267F\\u2692-\\u2697\\u2699\\u269B\\u269C\\"
              "u26A0\\u26A1\\u26AA\\u26AB\\u26B0\\u26B1\\u26BD\\u26BE\\u26C4\\u26C5\\"
              "u26C8\\u26CE\\u26CF\\u26D1\\u26D3\\u26D4\\u26E9\\u26EA\\u26F0-\\u26F5"
              "\\u26F7\\u26F8]|\\u26F9(?:\\uD83C[\\uDFFB-\\uDFFF](?:\\u200D[\\u2640"
              "\\u2642]\\uFE0F)?|\\uFE0F\\u200D[\\u2640\\u2642]\\uFE0F)?|[\\u26FA\\u"
              "26FD\\u2702\\u2705\\u2708\\u2709]|[\\u270A-\\u270D](?:\\uD83C[\\uDFF"
              "B-\\uDFFF])?|[\\u270F\\u2712\\u2714\\u2716\\u271D\\u2721\\u2728\\u273"
              "3\\u2734\\u2744\\u2747\\u274C\\u274E\\u2753-\\u2755\\u2757\\u2763\\u27"
              "64\\u2795-\\u2797\\u27A1\\u27B0\\u27BF\\u2934\\u2935\\u2B05-\\u2B07\\u"
              "2B1B\\u2B1C\\u2B50\\u2B55\\u3030\\u303D\\u3297\\u3299]|\\uD83C(?:[\\u"
              "DC04\\uDCCF\\uDD70\\uDD71\\uDD7E\\uDD7F\\uDD8E\\uDD91-\\uDD9A]|\\uDDE"
              "6\\uD83C[\\uDDE8-\\uDDEC\\uDDEE\\uDDF1\\uDDF2\\uDDF4\\uDDF6-\\uDDFA\\u"
              "DDFC\\uDDFD\\uDDFF]|\\uDDE7\\uD83C[\\uDDE6\\uDDE7\\uDDE9-\\uDDEF\\uDD"
              "F1-\\uDDF4\\uDDF6-\\uDDF9\\uDDFB\\uDDFC\\uDDFE\\uDDFF]|\\uDDE8\\uD83C"
              "[\\uDDE6\\uDDE8\\uDDE9\\uDDEB-\\uDDEE\\uDDF0-\\uDDF5\\uDDF7\\uDDFA-\\u"
              "DDFF]|\\uDDE9\\uD83C[\\uDDEA\\uDDEC\\uDDEF\\uDDF0\\uDDF2\\uDDF4\\uDDF"
              "F]|\\uDDEA\\uD83C[\\uDDE6\\uDDE8\\uDDEA\\uDDEC\\uDDED\\uDDF7-\\uDDFA]"
              "|\\uDDEB\\uD83C[\\uDDEE-\\uDDF0\\uDDF2\\uDDF4\\uDDF7]|\\uDDEC\\uD83C["
              "\\uDDE6\\uDDE7\\uDDE9-\\uDDEE\\uDDF1-\\uDDF3\\uDDF5-\\uDDFA\\uDDFC\\uD"
              "DFE]|\\uDDED\\uD83C[\\uDDF0\\uDDF2\\uDDF3\\uDDF7\\uDDF9\\uDDFA]|\\uDD"
              "EE\\uD83C[\\uDDE8-\\uDDEA\\uDDF1-\\uDDF4\\uDDF6-\\uDDF9]|\\uDDEF\\uD8"
              "3C[\\uDDEA\\uDDF2\\uDDF4\\uDDF5]|\\uDDF0\\uD83C[\\uDDEA\\uDDEC-\\uDDE"
              "E\\uDDF2\\uDDF3\\uDDF5\\uDDF7\\uDDFC\\uDDFE\\uDDFF]|\\uDDF1\\uD83C[\\u"
              "DDE6-\\uDDE8\\uDDEE\\uDDF0\\uDDF7-\\uDDFB\\uDDFE]|\\uDDF2\\uD83C[\\uD"
              "DE6\\uDDE8-\\uDDED\\uDDF0-\\uDDFF]|\\uDDF3\\uD83C[\\uDDE6\\uDDE8\\uDD"
              "EA-\\uDDEC\\uDDEE\\uDDF1\\uDDF4\\uDDF5\\uDDF7\\uDDFA\\uDDFF]|\\uDDF4\\"
              "uD83C\\uDDF2|\\uDDF5\\uD83C[\\uDDE6\\uDDEA-\\uDDED\\uDDF0-\\uDDF3\\uD"
              "DF7-\\uDDF9\\uDDFC\\uDDFE]|\\uDDF6\\uD83C\\uDDE6|\\uDDF7\\uD83C[\\uDD"
              "EA\\uDDF4\\uDDF8\\uDDFA\\uDDFC]|\\uDDF8\\uD83C[\\uDDE6-\\uDDEA\\uDDEC"
              "-\\uDDF4\\uDDF7-\\uDDF9\\uDDFB\\uDDFD-\\uDDFF]|\\uDDF9\\uD83C[\\uDDE6"
              "\\uDDE8\\uDDE9\\uDDEB-\\uDDED\\uDDEF-\\uDDF4\\uDDF7\\uDDF9\\uDDFB\\uDD"
              "FC\\uDDFF]|\\uDDFA\\uD83C[\\uDDE6\\uDDEC\\uDDF2\\uDDF3\\uDDF8\\uDDFE\\"
              "uDDFF]|\\uDDFB\\uD83C[\\uDDE6\\uDDE8\\uDDEA\\uDDEC\\uDDEE\\uDDF3\\uDD"
              "FA]|\\uDDFC\\uD83C[\\uDDEB\\uDDF8]|\\uDDFD\\uD83C\\uDDF0|\\uDDFE\\uD8"
              "3C[\\uDDEA\\uDDF9]|\\uDDFF\\uD83C[\\uDDE6\\uDDF2\\uDDFC]|[\\uDE01\\uD"
              "E02\\uDE1A\\uDE2F\\uDE32-\\uDE3A\\uDE50\\uDE51\\uDF00-\\uDF21\\uDF24-"
              "\\uDF84]|\\uDF85(?:\\uD83C[\\uDFFB-\\uDFFF])?|[\\uDF86-\\uDF93\\uDF9"
              "6\\uDF97\\uDF99-\\uDF9B\\uDF9E-\\uDFC1]|\\uDFC2(?:\\uD83C[\\uDFFB-\\u"
              "DFFF])?|[\\uDFC3\\uDFC4](?:\\u200D[\\u2640\\u2642]\\uFE0F|\\uD83C[\\"
              "uDFFB-\\uDFFF](?:\\u200D[\\u2640\\u2642]\\uFE0F)?)?|[\\uDFC5\\uDFC6"
              "]|\\uDFC7(?:\\uD83C[\\uDFFB-\\uDFFF])?|[\\uDFC8\\uDFC9]|\\uDFCA(?:\\"
              "u200D[\\u2640\\u2642]\\uFE0F|\\uD83C[\\uDFFB-\\uDFFF](?:\\u200D[\\u2"
              "640\\u2642]\\uFE0F)?)?|[\\uDFCB\\uDFCC](?:\\uD83C[\\uDFFB-\\uDFFF]("
              "?:\\u200D[\\u2640\\u2642]\\uFE0F)?|\\uFE0F\\u200D[\\u2640\\u2642]\\uF"
              "E0F)?|[\\uDFCD-\\uDFF0]|\\uDFF3(?:\\uFE0F\\u200D\\uD83C\\uDF08)?|\\u"
              "DFF4(?:\\u200D\\u2620\\uFE0F|\\uDB40\\uDC67\\uDB40\\uDC62\\uDB40(?:\\"
              "uDC65\\uDB40\\uDC6E\\uDB40\\uDC67|\\uDC73\\uDB40\\uDC63\\uDB40\\uDC74"
              "|\\uDC77\\uDB40\\uDC6C\\uDB40\\uDC73)\\uDB40\\uDC7F)?|[\\uDFF5\\uDFF7"
              "-\\uDFFF])|\\uD83D(?:[\\uDC00-\\uDC40]|\\uDC41(?:\\uFE0F\\u200D\\uD8"
              "3D\\uDDE8\\uFE0F)?|[\\uDC42\\uDC43](?:\\uD83C[\\uDFFB-\\uDFFF])?|[\\"
              "uDC44\\uDC45]|[\\uDC46-\\uDC50](?:\\uD83C[\\uDFFB-\\uDFFF])?|[\\uDC"
              "51-\\uDC65]|[\\uDC66\\uDC67](?:\\uD83C[\\uDFFB-\\uDFFF])?|\\uDC68(?"
              ":\\u200D(?:[\\u2695\\u2696\\u2708]\\uFE0F|\\u2764\\uFE0F\\u200D\\uD83"
              "D(?:\\uDC8B\\u200D\\uD83D)?\\uDC68|\\uD83C[\\uDF3E\\uDF73\\uDF93\\uDF"
              "A4\\uDFA8\\uDFEB\\uDFED]|\\uD83D(?:\\uDC66(?:\\u200D\\uD83D\\uDC66)?"
              "|\\uDC67(?:\\u200D\\uD83D[\\uDC66\\uDC67])?|[\\uDC68\\uDC69]\\u200D\\"
              "uD83D(?:\\uDC66(?:\\u200D\\uD83D\\uDC66)?|\\uDC67(?:\\u200D\\uD83D["
              "\\uDC66\\uDC67])?)|[\\uDCBB\\uDCBC\\uDD27\\uDD2C\\uDE80\\uDE92])|\\uD"
              "83E[\\uDDB0-\\uDDB3])|\\uD83C[\\uDFFB-\\uDFFF](?:\\u200D(?:[\\u2695"
              "\\u2696\\u2708]\\uFE0F|\\uD83C[\\uDF3E\\uDF73\\uDF93\\uDFA4\\uDFA8\\uD"
              "FEB\\uDFED]|\\uD83D[\\uDCBB\\uDCBC\\uDD27\\uDD2C\\uDE80\\uDE92]|\\uD8"
              "3E[\\uDDB0-\\uDDB3]))?)?|\\uDC69(?:\\u200D(?:[\\u2695\\u2696\\u2708"
              "]\\uFE0F|\\u2764\\uFE0F\\u200D\\uD83D(?:\\uDC8B\\u200D\\uD83D)?[\\uDC"
              "68\\uDC69]|\\uD83C[\\uDF3E\\uDF73\\uDF93\\uDFA4\\uDFA8\\uDFEB\\uDFED]"
              "|\\uD83D(?:\\uDC66(?:\\u200D\\uD83D\\uDC66)?|\\uDC67(?:\\u200D\\uD83"
              "D[\\uDC66\\uDC67])?|\\uDC69\\u200D\\uD83D(?:\\uDC66(?:\\u200D\\uD83D"
              "\\uDC66)?|\\uDC67(?:\\u200D\\uD83D[\\uDC66\\uDC67])?)|[\\uDCBB\\uDCB"
              "C\\uDD27\\uDD2C\\uDE80\\uDE92])|\\uD83E[\\uDDB0-\\uDDB3])|\\uD83C[\\u"
              "DFFB-\\uDFFF](?:\\u200D(?:[\\u2695\\u2696\\u2708]\\uFE0F|\\uD83C[\\u"
              "DF3E\\uDF73\\uDF93\\uDFA4\\uDFA8\\uDFEB\\uDFED]|\\uD83D[\\uDCBB\\uDCB"
              "C\\uDD27\\uDD2C\\uDE80\\uDE92]|\\uD83E[\\uDDB0-\\uDDB3]))?)?|[\\uDC6"
              "A-\\uDC6D]|\\uDC6E(?:\\u200D[\\u2640\\u2642]\\uFE0F|\\uD83C[\\uDFFB-"
              "\\uDFFF](?:\\u200D[\\u2640\\u2642]\\uFE0F)?)?|\\uDC6F(?:\\u200D[\\u2"
              "640\\u2642]\\uFE0F)?|\\uDC70(?:\\uD83C[\\uDFFB-\\uDFFF])?|\\uDC71(?"
              ":\\u200D[\\u2640\\u2642]\\uFE0F|\\uD83C[\\uDFFB-\\uDFFF](?:\\u200D[\\"
              "u2640\\u2642]\\uFE0F)?)?|\\uDC72(?:\\uD83C[\\uDFFB-\\uDFFF])?|\\uDC"
              "73(?:\\u200D[\\u2640\\u2642]\\uFE0F|\\uD83C[\\uDFFB-\\uDFFF](?:\\u20"
              "0D[\\u2640\\u2642]\\uFE0F)?)?|[\\uDC74-\\uDC76](?:\\uD83C[\\uDFFB-\\"
              "uDFFF])?|\\uDC77(?:\\u200D[\\u2640\\u2642]\\uFE0F|\\uD83C[\\uDFFB-\\"
              "uDFFF](?:\\u200D[\\u2640\\u2642]\\uFE0F)?)?|\\uDC78(?:\\uD83C[\\uDF"
              "FB-\\uDFFF])?|[\\uDC79-\\uDC7B]|\\uDC7C(?:\\uD83C[\\uDFFB-\\uDFFF])"
              "?|[\\uDC7D-\\uDC80]|[\\uDC81\\uDC82](?:\\u200D[\\u2640\\u2642]\\uFE0"
              "F|\\uD83C[\\uDFFB-\\uDFFF](?:\\u200D[\\u2640\\u2642]\\uFE0F)?)?|\\uD"
              "C83(?:\\uD83C[\\uDFFB-\\uDFFF])?|\\uDC84|\\uDC85(?:\\uD83C[\\uDFFB-"
              "\\uDFFF])?|[\\uDC86\\uDC87](?:\\u200D[\\u2640\\u2642]\\uFE0F|\\uD83C"
              "[\\uDFFB-\\uDFFF](?:\\u200D[\\u2640\\u2642]\\uFE0F)?)?|[\\uDC88-\\uD"
              "CA9]|\\uDCAA(?:\\uD83C[\\uDFFB-\\uDFFF])?|[\\uDCAB-\\uDCFD\\uDCFF-\\"
              "uDD3D\\uDD49-\\uDD4E\\uDD50-\\uDD67\\uDD6F\\uDD70\\uDD73]|\\uDD74(?:"
              "\\uD83C[\\uDFFB-\\uDFFF])?|\\uDD75(?:\\uD83C[\\uDFFB-\\uDFFF](?:\\u2"
              "00D[\\u2640\\u2642]\\uFE0F)?|\\uFE0F\\u200D[\\u2640\\u2642]\\uFE0F)?"
              "|[\\uDD76-\\uDD79]|\\uDD7A(?:\\uD83C[\\uDFFB-\\uDFFF])?|[\\uDD87\\uD"
              "D8A-\\uDD8D]|[\\uDD90\\uDD95\\uDD96](?:\\uD83C[\\uDFFB-\\uDFFF])?|["
              "\\uDDA4\\uDDA5\\uDDA8\\uDDB1\\uDDB2\\uDDBC\\uDDC2-\\uDDC4\\uDDD1-\\uDD"
              "D3\\uDDDC-\\uDDDE\\uDDE1\\uDDE3\\uDDE8\\uDDEF\\uDDF3\\uDDFA-\\uDE44]|"
              "[\\uDE45-\\uDE47](?:\\u200D[\\u2640\\u2642]\\uFE0F|\\uD83C[\\uDFFB-\\"
              "uDFFF](?:\\u200D[\\u2640\\u2642]\\uFE0F)?)?|[\\uDE48-\\uDE4A]|\\uDE"
              "4B(?:\\u200D[\\u2640\\u2642]\\uFE0F|\\uD83C[\\uDFFB-\\uDFFF](?:\\u20"
              "0D[\\u2640\\u2642]\\uFE0F)?)?|\\uDE4C(?:\\uD83C[\\uDFFB-\\uDFFF])?|"
              "[\\uDE4D\\uDE4E](?:\\u200D[\\u2640\\u2642]\\uFE0F|\\uD83C[\\uDFFB-\\u"
              "DFFF](?:\\u200D[\\u2640\\u2642]\\uFE0F)?)?|\\uDE4F(?:\\uD83C[\\uDFF"
              "B-\\uDFFF])?|[\\uDE80-\\uDEA2]|\\uDEA3(?:\\u200D[\\u2640\\u2642]\\uF"
              "E0F|\\uD83C[\\uDFFB-\\uDFFF](?:\\u200D[\\u2640\\u2642]\\uFE0F)?)?|["
              "\\uDEA4-\\uDEB3]|[\\uDEB4-\\uDEB6](?:\\u200D[\\u2640\\u2642]\\uFE0F|"
              "\\uD83C[\\uDFFB-\\uDFFF](?:\\u200D[\\u2640\\u2642]\\uFE0F)?)?|[\\uDE"
              "B7-\\uDEBF]|\\uDEC0(?:\\uD83C[\\uDFFB-\\uDFFF])?|[\\uDEC1-\\uDEC5\\u"
              "DECB]|\\uDECC(?:\\uD83C[\\uDFFB-\\uDFFF])?|[\\uDECD-\\uDED2\\uDEE0-"
              "\\uDEE5\\uDEE9\\uDEEB\\uDEEC\\uDEF0\\uDEF3-\\uDEF9])|\\uD83E(?:[\\uDD"
              "10-\\uDD17]|[\\uDD18-\\uDD1C](?:\\uD83C[\\uDFFB-\\uDFFF])?|\\uDD1D|"
              "[\\uDD1E\\uDD1F](?:\\uD83C[\\uDFFB-\\uDFFF])?|[\\uDD20-\\uDD25]|\\uD"
              "D26(?:\\u200D[\\u2640\\u2642]\\uFE0F|\\uD83C[\\uDFFB-\\uDFFF](?:\\u2"
              "00D[\\u2640\\u2642]\\uFE0F)?)?|[\\uDD27-\\uDD2F]|[\\uDD30-\\uDD36]("
              "?:\\uD83C[\\uDFFB-\\uDFFF])?|\\uDD37(?:\\u200D[\\u2640\\u2642]\\uFE0"
              "F|\\uD83C[\\uDFFB-\\uDFFF](?:\\u200D[\\u2640\\u2642]\\uFE0F)?)?|[\\u"
              "DD38\\uDD39](?:\\u200D[\\u2640\\u2642]\\uFE0F|\\uD83C[\\uDFFB-\\uDFF"
              "F](?:\\u200D[\\u2640\\u2642]\\uFE0F)?)?|\\uDD3A|\\uDD3C(?:\\u200D[\\"
              "u2640\\u2642]\\uFE0F)?|[\\uDD3D\\uDD3E](?:\\u200D[\\u2640\\u2642]\\u"
              "FE0F|\\uD83C[\\uDFFB-\\uDFFF](?:\\u200D[\\u2640\\u2642]\\uFE0F)?)?|"
              "[\\uDD40-\\uDD45\\uDD47-\\uDD70\\uDD73-\\uDD76\\uDD7A\\uDD7C-\\uDDA2\\"
              "uDDB0-\\uDDB4]|[\\uDDB5\\uDDB6](?:\\uD83C[\\uDFFB-\\uDFFF])?|\\uDDB"
              "7|[\\uDDB8\\uDDB9](?:\\u200D[\\u2640\\u2642]\\uFE0F|\\uD83C[\\uDFFB-"
              "\\uDFFF](?:\\u200D[\\u2640\\u2642]\\uFE0F)?)?|[\\uDDC0-\\uDDC2\\uDDD"
              "0]|[\\uDDD1-\\uDDD5](?:\\uD83C[\\uDFFB-\\uDFFF])?|\\uDDD6(?:\\u200D"
              "[\\u2640\\u2642]\\uFE0F|\\uD83C[\\uDFFB-\\uDFFF](?:\\u200D[\\u2640\\u"
              "2642]\\uFE0F)?)?|[\\uDDD7-\\uDDDD](?:\\u200D[\\u2640\\u2642]\\uFE0F"
              "|\\uD83C[\\uDFFB-\\uDFFF](?:\\u200D[\\u2640\\u2642]\\uFE0F)?)?|[\\uD"
              "DDE\\uDDDF](?:\\u200D[\\u2640\\u2642]\\uFE0F)?|[\\uDDE0-\\uDDFF])"
              

              【讨论】:

                【解决方案15】:

                Regex 太慢了,Emoji 更新很快。

                试试这个项目simple-emoji-4j

                兼容表情符号 12.0 (2018.10.15)

                简单:

                EmojiUtils.containsEmoji(str)
                

                【讨论】:

                  【解决方案16】:

                  \p{Cs} 非常适合将表情符号与 PCRE 正则表达式风格相匹配。在https://regex101.com/r/o69vJJ/1 进行测试。

                  Unicode Character Category is "Other, Surrogate"。

                  【讨论】:

                    【解决方案17】:

                    一些PRCE不承认\p。许多不允许超过 2 个字节的字符范围\udde6-\ud83c

                    我想出的一个有效技巧是对它们进行编码,以便强制对字符进行转义,例如 json。

                    编码为 json 后,字符现在是文字 \ud000,可以使用标准正则表达式找到:\\\\ud[0-9a-f]{3}\\\\u[0-9a-f]{4,6}

                    过滤掉转义的字符串后,可以在没有表情的情况下再次解码数据。

                    【讨论】:

                      猜你喜欢
                      • 1970-01-01
                      • 2018-01-28
                      • 2014-08-25
                      • 2021-01-10
                      • 1970-01-01
                      • 1970-01-01
                      • 2020-03-12
                      • 2018-06-20
                      相关资源
                      最近更新 更多