情感分析 Python 标记化答案

【问题标题】：Sentiment analysis Python tokenization情感分析 Python 标记化
【发布时间】：2022-01-15 04:04:29
【问题描述】：

我的问题如下：我想对意大利语推文进行情感分析，我想对我的意大利语文本进行标记化和词形还原，以便为我的论文找到新的分析维度。问题是我想标记我的主题标签，同时拆分组合的标签。例如，如果我有#nogreenpass，我也会没有# 符号，因为文本的所有单词都会更好地理解短语的情感。我怎么能这样做？我尝试使用 sapCy，但没有结果。我创建了一个函数来清理我的文本，但我不能以我想要的方式使用主题标签。我正在使用此代码：

import re
import spacy
from spacy.tokenizer import Tokenizer

nlp = spacy.load('it_core_news_lg')

# Clean_text function
def clean_text(text):
    text = str(text).lower()
    doc = nlp(text)
    text = re.sub(r'#[a-z0-9]+', str(' '.join(t in nlp(doc))), str(text))
    text = re.sub(r'\n', ' ', str(text)) # Remove /n
    text = re.sub(r'@[A-Za-z0-9]+', '<user>', str(text)) # Remove and replace @mention
    text = re.sub(r'RT[\s]+', '', str(text)) # Remove RT
    text = re.sub(r'https?:\/\/\S+', '<url>', str(text)) # Remove and replace links
    return text

例如，我不知道如何添加第一个来替换 # 符号，并且标记化过程无法正常工作。感谢您为我花费的时间和耐心。我希望在 Jupiter 分析和 python 编码方面变得更强大，这样我也可以为您的问题提供帮助。谢谢各位！

【问题讨论】：

这里的内容与 spacy 无关，而是与正则表达式有关。您能否提供一个示例字符串和预期的输出？
请检查ideone.com/pxZqeK - 它是否按预期工作？
@WiktorStribiżew 谢谢你的回答。它不像我想的那样工作。例如，使用这个字符串：“@Marcorossi hanno ragione I #novax asfag.com”，我会得到这样的输出：“ hanno ragione I ” 我想 spaCy 因为我想要那个组合的主题标签将被分隔并插入两个括号中，例如。感谢您的宝贵时间
然后，将< 和> 添加到替换中，re.sub(r'#(\w+)', r'<\1>', text)。见ideone.com/uG0YCW
@WiktorStribiżew 只有一个人认为我做不到：分离 novax 这个词。这样我有但我会，因为我认为 spaCy。

标签： python nlp spacy tokenize

【解决方案1】：

你可以调整你当前的clean_code

def clean_text(text):
    text = str(text).lower()
    text = re.sub(r'#(\w+)', r'<\1>', text)
    text = re.sub(r'\n', ' ', text) # Remove /n
    text = re.sub(r'@[A-Za-z0-9]+', '<user>', text) # Remove and replace @mention
    text = re.sub(r'RT\s+', '', text) # Remove RT
    text = re.sub(r'https?://\S+\b/?', '<url>', text) # Remove and replace links
    return text

请参阅Python demo online。

以下代码行：

print(clean_text("@Marcorossi hanno ragione I #novax htt"+"p://www.asfag.com/"))

将产生

<user> hanno ragione i <novax> <url>

请注意，没有简单的方法可以将粘合字符串拆分为其组成词。请参阅 How to split text without spaces into list of words 了解如何做到这一点。

【讨论】：