用频率的数值替换推文中的单词答案

【问题标题】：Replace words in a tweet with numeric value of their frequency用频率的数值替换推文中的单词
【发布时间】：2021-07-24 14:46:59
【问题描述】：

我在用频率的数值替换推文中的单词时遇到问题。

我已经制作了一个数据框，显示按频率排列的单词。

现在我想用每个单词的频率等级替换推文中的单词。

我附上了我的数据框的片段。

推文和词频数据：

我的目标是推文看起来像这样：

[1] [3] [7] [11] [18] [12] [10] [5] [3] [44] [23] [46] [2] [90]

[1] 表示它是数据集中出现频率最高的词。

任何帮助表示赞赏！ :)

【问题讨论】：

标签： r

【解决方案1】：

我认为stringr::str_replace_all 是一种有效的方法：只需向它传递一个带有您的词频的命名向量就可以了。

请参阅下面的代表。前几行只是生成随机数据；你的频率表看起来像我在下面生成的df。

sentence <- "the quick brown fox jumps over the lazy dog"

sentence_split <- unique(as.character(stringr::str_split(string = sentence, pattern = " ", simplify = TRUE)))

names(sentence_split) <- sample(x = 1:1000, size = length(sentence_split))

df <- data.frame(word = sentence_split,
                 n = sample(x = 1:1000, size = length(sentence_split)))

df
#>    word   n
#>    the    740
#>    quick  192
#>    brown  145
#>    fox    809
#>    jumps  700
#>    over   910
#>    lazy   352
#>    dog    256

replace_vector <- paste0("[", df$n, "]")
names(replace_vector) <- df$word

stringr::str_replace_all(string = sentence, pattern = replace_vector)
#> [1] "[740] [192] [145] [809] [700] [910] [740] [352] [256]"

^{由reprex package (v2.0.0) 于 2021 年 7 月 24 日创建}

【讨论】：

嘿，很抱歉我迟到的回复，非常感谢您的帮助。我试过你的建议，现在我面临两个小问题。 1. 我只能替换一列的单词，而不是整个数据帧 2. 我只能看到单词替换为它们的频率值作为输出，我无法替换它们“永久”在数据框中。你知道我在哪里做错了吗？任何帮助表示赞赏，并提前感谢您
如果您有一个带有文本列的数据框，并希望在创建 replace_vector 后将其分配回来，您可以执行以下操作：df$replaced <- stringr::str_replace_all(string = df$column_with_text, pattern = replace_vector)。您可以重复更多列，或者如果它们很多，例如purrr::purrr::map_dfc(df_with_text_columns, function(x) (stringr::str_replace_all(x, pattern = replace_vector))。