【发布时间】:2021-10-03 09:36:31
【问题描述】:
我有一个推文数据集,其中至少包含一个表情符号。但有时还有更多。表情符号可以在句子的中间,也可以在开头或结尾。因此,对于每条推文,情况都不同。我在尝试仅拆分句子中的表情符号时遇到了困难。如果我遍历每个单词,多个表情符号也被视为一个单词。
She is too hot for Congress. Vote her out! #sarcasm ????????????????????????
预期输出:She is too hot for Congress. Vote her out! #sarcasm ???? ???? ???? ???? ???? ????
The Struggle is Real ???????????? #struggle #struggleisreal #struggles #funny #humor #saying #sarcasm #lifestruggles #sarcastic #funnysaying #sayings #thestruggleisreal
预期输出:The Struggle is Real ???? ???? ???? #struggle #struggleisreal #struggles #funny #humor #saying #sarcasm #lifestruggles #sarcastic #funnysaying #sayings #thestruggleisreal
???????????? For More Funny Post Follow
预期输出:???? ???? ???? For More Funny Post Follow
上述帖子的答案为我提供了数据集中每条推文的列表和标记词,我不想要,它也不能解决我的问题。我在表情符号之间没有空格。
【问题讨论】:
标签: python nlp sentiment-analysis