使用 str_ 定位另一个数据帧中众多单词的频率答案

【问题标题】：Using str_to locate frequency of numerous words in another data frame使用 str_ 定位另一个数据帧中众多单词的频率
【发布时间】：2021-12-26 12:07:35
【问题描述】：

我有一个包含一列（7,234 行）Youtube 视频标题的数据框。我有一个单独的 71 个关键词列表。

我想找出所有 7,234 行中每个关键词出现的频率。

使用str_detect我可以找到每个单独关键词的频率。

当我使用summary 时，这给了我一个合乎逻辑的结果：

Mode   FALSE    TRUE 
logical    1462    5772

我不确定如何使用 for 循环对所有关键词执行此操作，以及如何将这些新数据放入新数据框中，并使用 colnames：视频标题，频率为真，频率为假

谢谢

【问题讨论】：

欢迎来到 StackOverflow！请阅读有关how to ask a good question 的信息以及如何提供reproducible example。这将使其他人更容易帮助您。
嗨。请注意，我已经更改了我提出的解决方案，以便它现在应该更准确地反映您所追求的。

标签： r dataframe for-loop

【解决方案1】：

您不需要for 循环。只需隔离所有单词，计算它们并使用频率过滤关键词：

玩具数据：

words <- c("apple", "pear", "grape")
sentences <- c("I have an apple and a pear", 
               "Grape is my favorite but I also like apple", 
               "I don't like pear and I don't like apple or applepie",
               "She hates fruit")

library(dplyr)
library(tidyr)
data.frame(sentences) %>%
  # separate sentences into single words:
  separate_rows(sentences, sep = "\\s") %>%
  # convert to lower-case:
  mutate(sentences = tolower(sentences)) %>%
  group_by(sentences) %>%
  # count:
  summarise(freq = n()) %>%
  filter(sentences %in% words)
# A tibble: 3 x 2
  value  freq
* <chr> <int>
1 apple     3
2 grape     1
3 pear      2

【讨论】：

谢谢 - 我很欣赏这个方法。我特别想使用 for 循环，只是为了学习如何真正做到这一点。