将字符向量强制并排序为因子，因子级别由另一个向量排序答案

【问题标题】：Coerce and order character vector to factor, with factor levels ordered by another vector将字符向量强制并排序为因子，因子级别由另一个向量排序
【发布时间】：2018-03-29 21:45:56
【问题描述】：

想象这样一个数据集：

# creating data for test
set.seed(1839)
id <- as.character(1:10)
frequency <- sample(c("n", "r", "s", "o", "a"), 10, TRUE)
frequency_value <- sapply(
  frequency, switch, "n" = -2, "r" = -1, "s" = 0, "o" = 1, "a" = 2
)
(test <- data.frame(id, frequency, frequency_value))

看起来像：

   id frequency frequency_value
1   1         a               2
2   2         o               1
3   3         r              -1
4   4         o               1
5   5         o               1
6   6         s               0
7   7         n              -2
8   8         n              -2
9   9         r              -1
10 10         n              -2

变量frequency 有我感兴趣的响应。它从从不到很少到有时到经常到总是。标签只是每个单词的第一个字母。订单显示在frequency_value。

我想做的是使frequency 成为一个因子，其水平顺序为 n、r、s、o、a。但是，我想让这取决于frequency_value 中的值。它们应该遵循frequency_value 中保留的顺序，并且不只是简单地硬编码（就像使用factor(frequency, levels = c("n", "r", "s", "o", "a")) 一样）。

我考虑过使用这个，tidyverse 解决方案：

levels <- test[, c("frequency", "frequency_value")] %>% 
  unique() %>% 
  arrange(as.numeric(frequency_value)) %>% 
  pull(frequency) %>% 
  as.character()
test$frequency <- factor(test$frequency, levels)

但是，当我在具有多个我想要考虑的变量的大数据集上执行此操作时，这似乎在计算上效率低下。有没有更有效的解决方案？

【问题讨论】：

标签： r

【解决方案1】：

在with 中使用unique 组合（您使用的）的顺序：

test$frequency <- factor(test$frequency, 
                         with(unique(test[, -1]), frequency[order(frequency_value)]))

[1] a o r o o s n n r n
Levels: 
n r s o a

【讨论】：

【解决方案2】：

Once 选项可以只使用dplyr 作为：

library(dplyr)
test <- test %>% arrange(frequency_value) %>% 
  mutate(frequency = factor(frequency, levels = unique(frequency))) 

test

#    id frequency frequency_value
# 1   7         n              -2
# 2   8         n              -2
# 3  10         n              -2
# 4   3         r              -1
# 5   9         r              -1
# 6   6         s               0
# 7   2         o               1
# 8   4         o               1
# 9   5         o               1
# 10  1         a               2

str(test)
#'data.frame':  10 obs. of  3 variables:
# $ id             : Factor w/ 10 levels "1","10","2","3",..: 8 9 2 4 10 7 3 5 6 1
# $ frequency      : Factor w/ 5 levels "n","r","s","o",..: 1 1 1 2 2 3 4 4 4 5
# $ frequency_value: num  -2 -2 -2 -1 -1 0 1 1 1 2

【讨论】：