拆分字符串并将 Discord 表情符号替换为 [name]答案

【问题标题】：Split string and replace Discord emoji to [name]拆分字符串并将 Discord 表情符号替换为 [name]
【发布时间】：2021-06-09 11:43:13
【问题描述】：

我有传入消息，例如 <a:GG:123456789> <:1Copy:12345678><:14:1256678>:eyes:Hello friend!:eyes:，我希望此输出为 [GG] [1Copy][14][eyes]Hello friend![eyes]

下面的代码是我目前拥有的，它的工作原理有点。上面传入的例子输出[GG] [1Copy] [14] [eyes]

def shorten_emojis(content):
    seperators = ("<a:", "<:")

    output = []

    for chunk in content.split():
        if any(match in chunk for match in seperators):
            parsed_chunk = []

            new_chunk = chunk.replace("<", ";<").replace(">", ">;")

            for emo in new_chunk.split(";"):
                if emo.startswith(seperators):
                    emo = f"<{splits[1]}>" if len(splits := emo.split(":")) == 3 else emo

                parsed_chunk.append(emo)

            chunk = "".join(parsed_chunk)

        output.append(chunk)

    output = " ".join(output)

    for e in re.findall(":.+?:", content):
        output = output.replace(e, f"<{e.replace(':', '')}>")

    return output

测试 #1

输入：<a:GG:123456789> <:1Copy:12345678><:14:1256678>:eyes:Hello friend!:eyes:

输出：[GG] [1Copy] [14] :eyes:Hello friend!:eyes:

想要的[GG] [1Copy][14][eyes]Hello friend![eyes]

测试 #2

输入：<a:cryLaptop:738450655395446814><:1Copy:817543814481707030><:14:817543815401439232> <:thoonk:621279654711656448><:coolbutdepressed:621279653675532290><:KL1Heart:585547199480332318>Nice<:dogwonder:621251869058269185> OK:eyes:

输出：[cryLaptop] [1Copy] [14] [thoonk] [coolbutdepressed] [KL1Heart] Nice [dogwonder] OK:eyes:

想要的[cryLaptop] [GG] [1Copy] [14] [thoonk] [coolbutdepressed] [KL1Heart] Nice [dogwonder] OK[eyes]

编辑

我已经编辑了我的代码块，现在可以正常工作了。

【问题讨论】：

标签： python string discord

【解决方案1】：

您可以使用正则表达式来做到这一点。它是一个已经包含 Python 本身的库。

我对代码进行了一些修改以使其更紧凑，但我认为它的理解是一样的。

最重要的是检测三组词。用(<. *?>)我们选择<words>，用(:. *? :)选择: word:和用(. *?)剩下的文字。

然后我们必须用期望值对其进行格式化并显示出来。

import re
def shorten_emojis(content):
    tags = re.findall('((<.*?>)|(:.*?:)||(.*?))', content)
    output=""
    for tag in tags:
        if re.findall("<.*?>", tag[0]):
            valor=re.search(':.*?:', tag[0])
            output+=f"[{valor.group()[1:-1]}]"
        elif re.match(":.*?:", tag[0]):
            output+=f"[{tag[0][1:-1]}]"
        else:
            output+=f"{tag[0]}"

    return output


print(shorten_emojis("<a:GG:123456789> <:1Copy:12345678><:14:1256678>:eyes:Hello friend!:eyes:"))
print(shorten_emojis("<a:cryLaptop:738450655395446814><:1Copy:817543814481707030><:14:817543815401439232> <:thoonk:621279654711656448><:coolbutdepressed:621279653675532290><:KL1Heart:585547199480332318>Nice<:dogwonder:621251869058269185> OK:eyes:"))

结果：

[GG] [1Copy][14][eyes]Hello friend![eyes]
[cryLaptop][1Copy][14] [thoonk][coolbutdepressed][KL1Heart]Nice[dogwonder] OK[eyes]

【讨论】：

【解决方案2】：

您可以使用带有替换 | 的单一模式来匹配两种变体。然后在 sub 的回调中，可以检查组 1 的存在。

<a?:([^:<>]+)[^<>]*>|:([^:]+):

模式匹配

<a?: 匹配<，可选a 和:
([^:<>]+) 在 group 1 中捕获除 : < 和 > 之外的任何字符
[^<>]*> 可以选择匹配除< 和> 之外的任何字符，然后匹配>
|或者
:([^:]+): 在组 2 中捕获所有 :

查看regex demo 和Python demo。

例如

import re

pattern = r"<a?:([^:<>]+)[^<>]*>|:([^:]+):"
def shorten_emojis(content):
    return re.sub(
        pattern, lambda x: f"[{x.group(1)}]" if x.group(1) else f"[{x.group(2)}]"
        ,content
    )

print(shorten_emojis("<a:GG:123456789> <:1Copy:12345678><:14:1256678>:eyes:Hello friend!:eyes:"))
print(shorten_emojis("<a:cryLaptop:738450655395446814><:1Copy:817543814481707030><:14:817543815401439232> <:thoonk:621279654711656448><:coolbutdepressed:621279653675532290><:KL1Heart:585547199480332318>Nice<:dogwonder:621251869058269185> OK:eyes:"))

输出

[GG] [1Copy][14][eyes]Hello friend![eyes]
[cryLaptop][1Copy][14] [thoonk][coolbutdepressed][KL1Heart]Nice[dogwonder] OK[eyes]

【讨论】：

这是一个非常好的解决方案，比我的更短更一致。也谢谢你的解释！