根据现有列将因子列添加到数据框答案

【问题标题】：Add factor column to dataframe based on existing column根据现有列将因子列添加到数据框
【发布时间】：2016-11-29 12:33:05
【问题描述】：

假设我有一个数据框：

word <- c("good", "great", "bad", "poor", "eh")
userid <- c(1, 2, 3, 4, 5)
d <- data.frame(userid, word)

我想添加一个数据框列sentiment，它是一个factor，取决于word 是什么：

words_pos <- c("good", "great")
words_neg <- c("bad", "poor")
calculate_sentiment <- function(x) {
     if (x %in% words_pos) {
         return("pos")
     } else if (x %in% words_neg) {
         return("neg")
     }
     return(NA)
}
d$sentiment <- apply(d, 1, function(x) calculate_sentiment(x['word'])

但是，现在d$sentiment 属于“字符”类型。我如何使它成为具有正确水平的因素？ pos, neg, NA -- 我什至不确定NA 是否应该是一个因子水平，因为我正在学习 R。

谢谢！

【问题讨论】：

试试：d$sentiment
如果只需要单列，请不要申请。这既危险（因为矩阵转换）又非常低效。而且我认为您正在寻找addNA 而不是factor。像addNA(sapply(word, calculate_sentiment)) 这样的东西。更不用说您可能也可以轻松地对其进行矢量化。

标签： r

【解决方案1】：

您可以在代码的最后一行添加as.factor。这将给出 pos 和 neg 的因素。 BTW NA 不是一个因素。

d$sentiment <-as.factor(apply(d, 1, function(x) calculate_sentiment(x['word'])))

【讨论】：

【解决方案2】：

这不会是最简单的方法，但它是一种非常易读的方法（在我看来，比使用抽象函数更可取）...使用dplyr 的mutate 和@ 987654323@:

library(dplyr)
d2 <- mutate(d, sentiment = factor(case_when(word %in% words_pos ~ "pos",
                                             word %in% words_neg ~ "neg",
                                             TRUE                ~ NA_character_)))

glimpse(d2)
#> Observations: 5
#> Variables: 3
#> $ userid    <dbl> 1, 2, 3, 4, 5
#> $ word      <fctr> good, great, bad, poor, eh
#> $ sentiment <fctr> pos, pos, neg, neg, NA

我已经把它隔开一点，这样更清楚，但这会：

然后接data.framed
mutate（更改一列）“情绪”等于一个因子，定义为
case 语句在 LHS 上具有逻辑逻辑，在 RHS 上产生结果（需要NA_character_，以便所有内容都是同一类型）。

输出确认这是一个具有所需值的factor 列。

【讨论】：