【发布时间】:2025-12-28 07:20:09
【问题描述】:
以下代码匹配文本中的正面和负面单词并计算它们。让我们考虑例如
sentences<-c("You are not perfect!",
"However, let us not forget what happened across the Atlantic.",
"And I can't support you.",
"No abnormal energy readings",
"So with gratitude, the universe is abundant forever.")
我们先导入正面和负面的词
pos = readLines("positive-words.txt")
neg = readLines("negative-words.txt")
来自 txt 文件。在这些文件中我们发现:
abundant
gratitude
perfect
support
对于positive-words.txt 和
abnormal
为negative-words.txt。以下命令:
sentence = gsub("[[:punct:]]", "", sentence)
sentence = gsub("[[:cntrl:]]", "", sentence)
sentence = gsub('\\d+', '', sentence)
删除数字、控制字符和标点符号。然后我们用str_split (stringr package)将句子分成单词
word.list = str_split(sentence, "\\s+")
words = unlist(word.list)
并将单词与正面和负面术语的字典进行比较
pos.matches = match(words, pos)
neg.matches = match(words, neg)
pos.matches = !is.na(pos.matches)
neg.matches = !is.na(neg.matches)
变量sentence 可以是sentences[1]、sentences[2]、sentences[3]、sentences[4] 或sentences[5]。例如。如果sentence=sentences[5],此代码正确返回两个正字;实际上结果是:
> pos.matches
[1] FALSE FALSE TRUE FALSE FALSE FALSE TRUE FALSE
所有其他句子也是如此。例如。如果sentence=sentences[4]:
> neg.matches
[1] FALSE TRUE FALSE FALSE
无论如何,我想修改此代码以解决sentences[1]、sentences[3] 和sentences[4] 中包含的情况。实际上:sentences[1] 中的perfect 是一个肯定词,但它前面是not,然后我想将这两个词视为一个(否定)词; sentences[3] 中的support 是正面词,但前面是cant,然后我想将这两个词视为负面词; abnormal in sentences[4] 是一个否定词,但它前面是no,然后我想将这两个词视为一个积极词。例如。 sentence=sentences[4] 的期望结果是:
> pos.matches
[1] TRUE FALSE FALSE
相反,我通过这段代码获得:
> pos.matches
[1] FALSE FALSE FALSE FALSE
我想然后定义一个带有否定和否定的变量:
NegativesNegations <- paste("\\b(", paste(c("no","not","couldnt","cant"), collapse = "|"), ")\\b")
但我不知道该怎么做。
【问题讨论】: