【问题标题】:Sentiment score for sentence in rr中句子的情感分数
【发布时间】:2018-09-24 15:00:34
【问题描述】:

我看到每个句子的情绪得分非常好的 R 脚本,可在:sentiment.R 获得,我想知道如何替换这部分

# split into words. str_split is in the stringr package
word.list = str_split(sentence, '\\s+')
# sometimes a list() is one level of hierarchy too much
words = unlist(word.list)

用于匹配多个词与 pos 和 neg 字典与多个词。我有一个下面的例子。

我有以下data.frame:

sent <- data.frame(words = c("just right size", "love this quality", 
                         "good quality", "very good quality", "i hate this notebook",
                         "great improvement", "notebook is not good","notebook was"), user = c(1,2,3,4,5,6,7,8))

                 words user
1      just right size    1
2    love this quality    2
3         good quality    3
4    very good quality    4
5 i hate this notebook    5
6    great improvement    6
7 notebook is not good    7
8         notebook was    8

然后我有正负词的词典:

posWord <- c("great","improvement","love","great improvement","very good","good","right","very")
negWords <- c("hate","bad","not good","horrible")

所需的输出如下:

                 words user  SentimentScore
1      just right size    1               1
2    love this quality    2               1
3         good quality    3               1
4    very good quality    4               1
5 i hate this notebook    5              -1
6    great improvement    6               1
7 notebook is not good    7              -1
8         notebook was    8               0

我应该如何在 github 上重写该代码以获得所需的输出。我的意思是,如果我按原样使用 github 上的源代码,那么例如在第 4 行,SentimentScore 列中将有 2 而不是 1。

请问有人对此有任何建议或类似的解决方案吗?我会感谢你的任何帮助。非常感谢您。

【问题讨论】:

  • 好的,这真是完美的解决方案 :-) 很抱歉,但我已经更新了任务...如果我在八行中没有匹配并且 SentimentScore 结果为零怎么办。
  • 如果没有匹配,SentimentScore 将为零。

标签: r


【解决方案1】:

我没有看你提到的图书馆。 这可能是您现在想要的。我用正面和负面的词创建了一个数据框。我为它们分配了一个 -/+ 1 值。然后我为它们分配了一个长度值以进行排序。这样首先使用最长的单词/短语。

 sent <- data.frame(words = c("just right size", "love this quality", 
                             "good quality", "very good quality", "i hate this notebook",
                             "great improvement", "notebook is not good"), user = c(1,2,3,4,5,6,7),
                             stringsAsFactors=F)

posWords <- c("great","improvement","love","great improvement","very good","good","right","very")
negWords <- c("hate","bad","not good","horrible")

wordsDF<- data.frame(words = posWords, value = 1,stringsAsFactors=F)
wordsDF<- rbind(wordsDF,data.frame(words = negWords, value = -1))
wordsDF$lengths<-unlist(lapply(wordsDF$words, nchar))
wordsDF<-wordsDF[ order(-wordsDF[,3]),]


scoreSentence <- function(sentence){
  score<-0
  for(x in 1:nrow(wordsDF)){
    count<-length(grep(wordsDF[x,1],sentence))
    if(count){
      score<-score + (count * wordsDF[x,2])
      sentence<-sub(wordsDF[x,1],'',sentence)
    }
  }
  score
}

SentimentScore<- unlist(lapply(sent$words, scoreSentence))
cbind(sent, SentimentScore)

输出

                 words user SentimentScore
1      just right size    1              1
2    love this quality    2              1
3         good quality    3              1
4    very good quality    4              1
5 i hate this notebook    5             -1
6    great improvement    6              1
7 notebook is not good    7             -1

【讨论】:

  • 感谢您的目的,但在第 4 行和第 6 行,您在 SentimentScore 列中有 2 而不是 1。
  • 当然,我需要避免使用 strsplit(sentence, '\\s+'),因为这会将文本拆分成唯一的单词,所以你无法做到多个词匹配。
  • 我使用了 grep 和单词列表来对抗句子。我不确定多个匹配的规则是什么。如果找到匹配项,我认为您可能需要删除单词/单词?
  • 我的意思是,例如对于句子 4:“质量非常好”,整体评价将仅针对多个术语“非常好”,而不是分别针对“非常”和“好”。所以 SentimentScore 将是 1 而不是 2。
  • 我真的不知道,它应该如何工作......我只需要例如“不好” SentimentScore = -1 而不是 +1,因为有匹配的好,这是积极的词 :-)
猜你喜欢
  • 1970-01-01
  • 2020-09-28
  • 1970-01-01
  • 2013-03-22
  • 2015-04-11
  • 1970-01-01
  • 1970-01-01
  • 2013-03-17
  • 1970-01-01
相关资源
最近更新 更多