【问题标题】:Rank values column-wise按列对值进行排名
【发布时间】:2017-08-08 08:06:23
【问题描述】:

我想按列对值进行排名。

我有以下数据框:

dput(test)
structure(list(Name = c("A", "B", "C", "D"), Margin = c(744, 
3196.4722, 0, 394), T1 = c(420, 200, 2150, 70), T2 = c(630, 285, 
2365, 84), T3 = c(630, 335, 2580, 105), T4 = c(666, 410, 2795, 
128), T5 = c(2244, 2961.7931, 3010, 142), T6 = c(2244, 3652.472, 
3440, 151), T7 = c(2244, 3722.472, 3870, 168), T8 = c(2244, 3887.472, 
5160, 187), T9 = c(2244, 4112.472, 6450, 225), T10 = c(2244, 
4337.472, 6450, 225), T11 = c(798, 3567.472, 4300, 112), T12 = c(630, 
3582.472, 4300, 111), T13 = c(702, 3582.472, 4300, 112), T14 = c(3600, 
4637.472, 3440, 78), T15 = c(744, 3067.306, 2580, 274), T16 = c(744, 
2770.5666, 2580, 197), T17 = c(744, 3138.806, 2580, 80), T18 = c(2244, 
3920.0836, 3870, 401), T19 = c(2244, 2789.1117, 1290, 127)), .Names = c("Name", 
"Margin", "T1", "T2", "T3", "T4", "T5", "T6", "T7", "T8", "T9", 
"T10", "T11", "T12", "T13", "T14", "T15", "T16", "T17", "T18", 
"T19"), row.names = c(NA, -4L), class = c("tbl_df", "tbl", "data.frame"
))

每一行的名称都有唯一的 ID,我想对列进行排名以确定哪一列与边距列中的值相等或最小。

理想的输出是:

Name    Margin    Closest_Column
 A      744.000        T15

分手可能是随机的。

我的尝试:

nm1 <- paste("rank", names(test)[3:21], sep="_")
test[nm1] <-  mutate_all(test[3:21],funs(rank(., ties.method="first")))

【问题讨论】:

    标签: r data.table dplyr rank


    【解决方案1】:

    我会选择长格式

    library(tidyr)
    library(dplyr)
    
    test %>%
      gather(Variable, Value, -(Name:Margin)) %>%
      group_by(Name, Margin) %>%
      summarise(Closest = Variable[which.min(abs(Value - Margin))])
    
    # A tibble: 4 x 3
    # Groups:   Name [?]
    #    Name   Margin Closest
    #   <chr>    <dbl>   <chr>
    # 1     A  744.000     T15
    # 2     B 3196.472     T17
    # 3     C    0.000     T19
    # 4     D  394.000     T18
    

    或者使用data.table

    library(data.table)
    melt(setDT(test), 1:2
         )[, .(Closest = variable[which.min(abs(value - Margin))]),
             by = .(Name, Margin)]
    #    Name   Margin Closest
    # 1:    A  744.000     T15
    # 2:    B 3196.472     T17
    # 3:    C    0.000     T19
    # 4:    D  394.000     T18
    

    【讨论】:

    • 这个其实很直观。我早该想到的。
    【解决方案2】:

    使用 cbind.data.frame 将前两列对齐到一列,该列由选择的名称构成,该名称是列的绝对值减去 Margin 的最小值:

    cbind( test[1:2], Closest_Column =
        apply(test[-1], 1, function(x) names(x[-1])[which.min( abs(x[-1]-x[1]))] ) )
      Name   Margin Closest_Column
    1    A  744.000            T15
    2    B 3196.472            T17
    3    C    0.000            T19
    4    D  394.000            T18
    

    【讨论】:

      【解决方案3】:

      如果我们需要使用tidyverse,一种方法是rowwise,然后找到'Margin'与其他列的最小差异的索引,得到列名

      test %>% 
            rowwise() %>% 
            do(data.frame(.[1:2], Closest_column = names(.)[3:21][which.min(abs(.[[2]]-
                              unlist(.[3:21])))]))
      # A tibble: 4 x 3
      #    Name   Margin Closest_column
      #* <chr>    <dbl>          <chr>
      #1     A  744.000            T15
      #2     B 3196.472            T17
      #3     C    0.000            T19
      #4     D  394.000            T18
      

      或者另一种选择是

      library(tidyverse)
      gather(test, Closest_column, val, T1:T19) %>%
              group_by(Name) %>% 
              slice(which.min(abs(Margin - val))) %>%
              select(-val)
      # A tibble: 4 x 3
      # Groups:   Name [4]
      #    Name   Margin Closest_column
      #  <chr>    <dbl>          <chr>
      #1     A  744.000            T15
      #2     B 3196.472            T17
      #3     C    0.000            T19
      #4     D  394.000            T18
      

      base R 是一个有效的选择,max.col

      cbind(test[1:2], 
          Closest_column = names(test)[3:21][max.col(-abs(test[3:21]-test[[2]]), 'first')])
      #    Name   Margin Closest_column
      #1    A  744.000            T15
      #2    B 3196.472            T17
      #3    C    0.000            T19
      #4    D  394.000            T18
      

      【讨论】:

      • 只是快速跟进。如果我想将条件更改为: Val
      • @Prometheus 你可以试试gather(test, Closest_column, val, T1:T19) %&gt;% group_by(Name) %&gt;% slice(which(val &lt;= Margin)[1])
      猜你喜欢
      • 2016-08-03
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2013-08-17
      • 1970-01-01
      • 2017-04-28
      • 2019-08-08
      相关资源
      最近更新 更多