情感分析包含一系列广泛的方法,旨在衡量文本中的正面和负面情绪,因此很难简单地回答这个问题。但这里有一个简单的答案:您可以将字典应用于您的文档术语矩阵,然后结合字典的正面和负面关键类别来创建情绪度量。
我建议在文本分析包 quanteda 中尝试这个,它可以处理各种现有的字典格式并允许您创建非常灵活的自定义字典。
例如:
require(quanteda)
mycorpus <- subset(inaugCorpus, Year>1980)
mydict <- dictionary(list(negative = c("detriment*", "bad*", "awful*", "terrib*", "horribl*"),
postive = c("good", "great", "super*", "excellent")))
myDfm <- dfm(mycorpus, dictionary = mydict)
## Creating a dfm from a corpus ...
## ... lowercasing
## ... tokenizing
## ... indexing documents: 9 documents
## ... indexing features: 3,113 feature types
## ... applying a dictionary consisting of 2 keys
## ... created a 9 x 2 sparse dfm
## ... complete.
## Elapsed time: 0.057 seconds.
myDfm
## Document-feature matrix of: 9 documents, 2 features.
## 9 x 2 sparse Matrix of class "dfmSparse"
## features
## docs negative postive
## 1981-Reagan 0 6
## 1985-Reagan 0 6
## 1989-Bush 0 18
## 1993-Clinton 1 2
## 1997-Clinton 2 8
## 2001-Bush 1 6
## 2005-Bush 0 8
## 2009-Obama 2 3
## 2013-Obama 1 3
# use a LIWC dictionary - obviously you need this file
liwcdict <- dictionary(file = "LIWC2001_English.dic", format = "LIWC")
myDfmLIWC <- dfm(mycorpus, dictionary = liwcdict)
## Creating a dfm from a corpus ...
## ... lowercasing
## ... tokenizing
## ... indexing documents: 9 documents
## ... indexing features: 3,113 feature types
## ... applying a dictionary consisting of 68 keys
## ... created a 9 x 68 sparse dfm
## ... complete.
## Elapsed time: 1.844 seconds.
myDfmLIWC[, grep("^Pos|^Neg", features(myDfmLIWC))]
## Document-feature matrix of: 9 documents, 4 features.
## 9 x 4 sparse Matrix of class "dfmSparse"
## features
## docs Negate Posemo Posfeel Negemo
## 1981-Reagan 46 89 5 24
## 1985-Reagan 28 104 7 33
## 1989-Bush 40 102 10 8
## 1993-Clinton 25 51 3 23
## 1997-Clinton 27 64 5 22
## 2001-Bush 40 80 6 27
## 2005-Bush 25 117 5 31
## 2009-Obama 40 83 5 46
## 2013-Obama 42 80 13 22
对于您的语料库,假设您将其放入名为 data 的 data.frame 中,您可以使用以下方法创建一个 quanteda 语料库:
mycorpus <- corpus(data$Content, docvars = data[, 1:2])
另请参阅?textfile,通过一个简单的命令从文件中加载内容。例如,这适用于 .csv 文件,尽管您会遇到该文件的问题,因为 Content 字段包含包含逗号的文本。
当然,还有许多其他方法可以衡量情绪,但如果您是情绪挖掘和 R 的新手,那应该可以帮助您入门。你可以阅读更多关于情感挖掘方法的信息(如果你已经遇到过这些方法,请见谅):