【问题标题】:Coerce and order character vector to factor, with factor levels ordered by another vector将字符向量强制并排序为因子,因子级别由另一个向量排序
【发布时间】:2018-03-29 21:45:56
【问题描述】:

想象这样一个数据集:

# creating data for test
set.seed(1839)
id <- as.character(1:10)
frequency <- sample(c("n", "r", "s", "o", "a"), 10, TRUE)
frequency_value <- sapply(
  frequency, switch, "n" = -2, "r" = -1, "s" = 0, "o" = 1, "a" = 2
)
(test <- data.frame(id, frequency, frequency_value))

看起来像:

   id frequency frequency_value
1   1         a               2
2   2         o               1
3   3         r              -1
4   4         o               1
5   5         o               1
6   6         s               0
7   7         n              -2
8   8         n              -2
9   9         r              -1
10 10         n              -2

变量frequency 有我感兴趣的响应。它从从不到很少到有时到经常到总是。标签只是每个单词的第一个字母。订单显示在frequency_value

我想做的是使frequency 成为一个因子,其水平顺序为 n、r、s、o、a。但是,我想让这取决于frequency_value 中的值。它们应该遵循frequency_value 中保留的顺序,并且只是简单地硬编码(就像使用factor(frequency, levels = c("n", "r", "s", "o", "a")) 一样)。

我考虑过使用这个,tidyverse 解决方案:

levels <- test[, c("frequency", "frequency_value")] %>% 
  unique() %>% 
  arrange(as.numeric(frequency_value)) %>% 
  pull(frequency) %>% 
  as.character()
test$frequency <- factor(test$frequency, levels)

但是,当我在具有多个我想要考虑的变量的大数据集上执行此操作时,这似乎在计算上效率低下。有没有更有效的解决方案?

【问题讨论】:

    标签: r


    【解决方案1】:

    with 中使用unique 组合(您使用的)的顺序:

    test$frequency <- factor(test$frequency, 
                             with(unique(test[, -1]), frequency[order(frequency_value)]))
    
    [1] a o r o o s n n r n
    Levels: 
    n r s o a
    

    【讨论】:

      【解决方案2】:

      Once 选项可以只使用dplyr 作为:

      library(dplyr)
      test <- test %>% arrange(frequency_value) %>% 
        mutate(frequency = factor(frequency, levels = unique(frequency))) 
      
      test
      
      #    id frequency frequency_value
      # 1   7         n              -2
      # 2   8         n              -2
      # 3  10         n              -2
      # 4   3         r              -1
      # 5   9         r              -1
      # 6   6         s               0
      # 7   2         o               1
      # 8   4         o               1
      # 9   5         o               1
      # 10  1         a               2
      
      str(test)
      #'data.frame':  10 obs. of  3 variables:
      # $ id             : Factor w/ 10 levels "1","10","2","3",..: 8 9 2 4 10 7 3 5 6 1
      # $ frequency      : Factor w/ 5 levels "n","r","s","o",..: 1 1 1 2 2 3 4 4 4 5
      # $ frequency_value: num  -2 -2 -2 -1 -1 0 1 1 1 2
      

      【讨论】:

        猜你喜欢
        • 1970-01-01
        • 1970-01-01
        • 2021-11-14
        • 1970-01-01
        • 2016-12-01
        • 1970-01-01
        • 1970-01-01
        • 2016-02-04
        • 1970-01-01
        相关资源
        最近更新 更多