【问题标题】:R: Counting frequency of words from own dictionaryR:计算自己字典中单词的频率
【发布时间】:2022-07-16 04:18:20
【问题描述】:
我已经分析了一些 Instagram 帖子,并且已经计算了每个帖子的字数(每一行都是一个帖子),如下所示:
Data
现在我要做的是计算每个帖子中的所有绿色/可持续字词,并将这些绿色字词添加为额外的列。我自己创建了一个词典,其中所有绿色单词的极性为 1,非绿色单词的极性为 0。
Lexicon
我该怎么做?
【问题讨论】:
标签:
r
dictionary
frequency
lexicon
【解决方案1】:
来自stringr 的str_count() 可以帮助解决这个问题(以及更多基于字符串的任务,请参阅this R4DS chapter。
library(string)
# Create a reproducible example
dat <- data.frame(Post = c(
"This is a sample post without any target words",
"Whilst this is green!",
"And this is eco-friendly",
"This is green AND eco-friendly!"))
lexicon <- data.frame(Word = c("green", "eco-friendly", "neutral"),
Polarity = c(1, 1, 0))
# Extract relevant words from lexicon
green_words <- lexicon$Word[lexicon$Polarity == 1]
# Create new variable
dat$n_green_words <- str_count(dat$Post, paste(green_words, collapse = "|"))
dat
输出:
#> Post n_green_words
#> 1 This is a sample post without any target words 0
#> 2 Whilst this is green! 1
#> 3 And this is eco-friendly 1
#> 4 This is green AND eco-friendly! 2
由reprex package 创建于 2022-07-15 (v2.0.1)