【发布时间】:2014-05-15 11:03:21
【问题描述】:
我指的是previously asked question:我想对德国推文进行情绪分析,并且一直在使用我提到的 stackoverflow 线程中的以下代码。但是,我想做一个分析,得到实际的情绪分数,而不仅仅是 TRUE/FALSE 的总和,无论一个词是正面的还是负面的。有什么简单的方法可以做到这一点吗?
你也可以在previous thread找到单词列表。
library(plyr)
library(stringr)
readAndflattenSentiWS <- function(filename) {
words = readLines(filename, encoding="UTF-8")
words <- sub("\\|[A-Z]+\t[0-9.-]+\t?", ",", words)
words <- unlist(strsplit(words, ","))
words <- tolower(words)
return(words)
}
pos.words <- c(scan("Post3/positive-words.txt",what='character', comment.char=';', quiet=T),
readAndflattenSentiWS("Post3/SentiWS_v1.8c_Positive.txt"))
neg.words <- c(scan("Post3/negative-words.txt",what='character', comment.char=';', quiet=T),
readAndflattenSentiWS("Post3/SentiWS_v1.8c_Negative.txt"))
score.sentiment = function(sentences, pos.words, neg.words, .progress='none') {
require(plyr)
require(stringr)
scores = laply(sentences, function(sentence, pos.words, neg.words)
{
# clean up sentences with R's regex-driven global substitute, gsub():
sentence = gsub('[[:punct:]]', '', sentence)
sentence = gsub('[[:cntrl:]]', '', sentence)
sentence = gsub('\\d+', '', sentence)
# and convert to lower case:
sentence = tolower(sentence)
# split into words. str_split is in the stringr package
word.list = str_split(sentence, '\\s+')
# sometimes a list() is one level of hierarchy too much
words = unlist(word.list)
# compare our words to the dictionaries of positive & negative terms
pos.matches = match(words, pos.words)
neg.matches = match(words, neg.words)
# match() returns the position of the matched term or NA
# I don't just want a TRUE/FALSE! How can I do this?
pos.matches = !is.na(pos.matches)
neg.matches = !is.na(neg.matches)
# and conveniently enough, TRUE/FALSE will be treated as 1/0 by sum():
score = sum(pos.matches) - sum(neg.matches)
return(score)
},
pos.words, neg.words, .progress=.progress )
scores.df = data.frame(score=scores, text=sentences)
return(scores.df)
}
sample <- c("ich liebe dich. du bist wunderbar",
"Ich hasse dich, geh sterben!",
"i love you. you are wonderful.",
"i hate you, die.")
(test.sample <- score.sentiment(sample,
pos.words,
neg.words))
【问题讨论】:
-
您的代码是否运行正常?我猜
laply应该是lapply但你引用的帖子也写了...... -
是的,它运行正常。我实际上尝试将 laply 更改为 lapply ,然后它不再起作用了。我对这些功能还很陌生,所以我不知道为什么......
-
啊,
laply是 plyr 的一部分!很高兴我现在没有编辑“修复”它:-)
标签: r sentiment-analysis