【问题标题】:Using sort and rank in R on multiple columns在 R 中对多列使用排序和排名
【发布时间】:2018-06-09 15:39:13
【问题描述】:

我正在尝试按每个州的最低比率对我的医院名称进行排名。 当多家医院的费率相同时,应使用医院名称并按字母顺序排序来打破平局。到目前为止,我已经设法按州内按医院名称排序的比率对其进行排名,但我不知道如何打破平局并在不跳过数字的情况下对其进行排名

这是我到目前为止使用以下代码得到的结果:

outcome_data <- read.csv("outcome-of-care-measures.csv", na.strings="Not Available" ,stringsAsFactors=FALSE) #Read csv file
myData = outcome_data[,c(2, 7, 11)] #Retrieve only Hosp name, state and heart attack rate
arr1<-myData[complete.cases(myData[,3]),]  ##Remove NAs
arr2 <- arr1[order(arr1[2], arr1[3], arr1[1]),] #sort by state, then rate and then hospital name
arr3<-transform(arr2, rank = ave(rate, State, FUN = function(x) rank(x, ties.method = "min"))) #Rank by rate within each state

我目前得到的输出是:

Hospital.Name                           State  rate  rank
SOUTH PENINSULA HOSPITAL                AK     10.8  1
YUKON KUSKOKWIM DELTA REG HOSPITAL      AK     11.2  2
MAT-SU REGIONAL MEDICAL CENTER          AK     11.4  3
PEACEHEALTH KETCHIKAN MEDICAL CENTER    AK     11.4  3
ALASKA NATIVE MEDICAL CENTER            AK     11.6  5
BARTLETT REGIONAL HOSPITAL              AK     11.6  5
CENTRAL PENINSULA GENERAL HOSPITAL      AK     11.6  5
PROVIDENCE ALASKA MEDICAL CENTER        AK     12.4  8
ALASKA REGIONAL HOSPITAL                AK     13.4  9
FAIRBANKS MEMORIAL HOSPITAL             AK     15.6  10
GEORGE H. LANIER MEMORIAL HOSPITAL      AL     8.8   1
EVERGREEN MEDICAL CENTER                AL     9.1   2
BAPTIST MEDICAL CENTER EAST             AL     9.6   3
LAWRENCE MEDICAL CENTER                 AL     9.9   4
ANDALUSIA REGIONAL HOSPITAL             AL     10.1  5
JACKSON HOSPITAL & CLINIC INC           AL     10.2  6
BIRMINGHAM VA MEDICAL CENTER            AL     10.4  7
FLORALA MEMORIAL HOSPITAL               AL     10.4  7
GROVE HILL MEMORIAL HOSPITAL            AL     10.4  7
SPRINGHILL MEDICAL CENTER               AL     10.4  7
WEDOWEE HOSPITAL                        AL     10.4  7
PARKWAY MEDICAL CENTER                  AL     10.5  12
ST VINCENT'S BIRMINGHAM                 AL     10.6  13
WIREGRASS MEDICAL CENTER                AL     10.6  13
GADSDEN REGIONAL MEDICAL CENTER         AL     10.7  15
HALE COUNTY HOSPITAL                    AL     10.7  15
MOBILE INFIRMARY                        AL     10.7  15

但我想要得到的是

Hospital.Name                           State  rate  rank
SOUTH PENINSULA HOSPITAL                AK     10.8  1
YUKON KUSKOKWIM DELTA REG HOSPITAL      AK     11.2  2
MAT-SU REGIONAL MEDICAL CENTER          AK     11.4  3
PEACEHEALTH KETCHIKAN MEDICAL CENTER    AK     11.4  4
ALASKA NATIVE MEDICAL CENTER            AK     11.6  5
BARTLETT REGIONAL HOSPITAL              AK     11.6  6
CENTRAL PENINSULA GENERAL HOSPITAL      AK     11.6  7
PROVIDENCE ALASKA MEDICAL CENTER        AK     12.4  8
ALASKA REGIONAL HOSPITAL                AK     13.4  9
FAIRBANKS MEMORIAL HOSPITAL             AK     15.6  10
GEORGE H. LANIER MEMORIAL HOSPITAL      AL     8.8   1
EVERGREEN MEDICAL CENTER                AL     9.1   2
BAPTIST MEDICAL CENTER EAST             AL     9.6   3
LAWRENCE MEDICAL CENTER                 AL     9.9   4
ANDALUSIA REGIONAL HOSPITAL             AL     10.1  5
JACKSON HOSPITAL & CLINIC INC           AL     10.2  6
BIRMINGHAM VA MEDICAL CENTER            AL     10.4  7
FLORALA MEMORIAL HOSPITAL               AL     10.4  8
GROVE HILL MEMORIAL HOSPITAL            AL     10.4  9
SPRINGHILL MEDICAL CENTER               AL     10.4  10
WEDOWEE HOSPITAL                        AL     10.4  11
PARKWAY MEDICAL CENTER                  AL     10.5  12
ST VINCENT'S BIRMINGHAM                 AL     10.6  13
WIREGRASS MEDICAL CENTER                AL     10.6  14
GADSDEN REGIONAL MEDICAL CENTER         AL     10.7  15
HALE COUNTY HOSPITAL                    AL     10.7  16
MOBILE INFIRMARY                        AL     10.7  17

有什么想法吗?

【问题讨论】:

    标签: r sorting rank


    【解决方案1】:

    我们需要在order 步骤之后按组分配序列号

    library(dplyr)
    arr2 %>%
         group_by(State) %>%
         mutate(rank = row_number())
    

    或者如果我们从'arr1'开始

    arr1 %>%
       arrange(State, rate,  Hospital.Name) %>%
       group_by(State) %>%
       mutate(rank = row_number())
    

    或者使用来自base Rave

    with(arr2, ave(seq_along(State), State, FUN = seq_along))
    #[1]  1  2  3  4  5  6  7  8  9 10  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17
    

    【讨论】:

    • 在使用 arr2 %>% group_by(State) %>% mutate(rank = row_number()) 我得到一个连续的排名,但它不再按比率排序,然后是医院名称,即我需要什么
    【解决方案2】:

    使用data.table 比较简单:

    library(data.table)
    
    # Read only relevant columns from csv file using data.table::fread
    outcome_data <- fread("outcome-of-care-measures.csv",
                          na.strings="Not Available" ,
                          select = c("Hospital.Name","State","rate"))
    
    # Drop rows NA values using data.table::na.omit
    outcome_data <- na.omit(outcome_data)
    
    ## Use data.table::setkey to sort/index by State, then rate, then hospital name
    setkey(outcome_data,State,rate,Hospital.Name)
    
    ## Add a rank column by state, order within groups will be based key order above
    ## (the .N operator is the number of rows in each State group)
    outcome_data[,rank := seq_len(.N),by = .(State)]
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 2020-06-21
      • 2015-02-03
      • 1970-01-01
      • 1970-01-01
      • 2014-09-06
      • 1970-01-01
      • 1970-01-01
      • 2021-04-08
      相关资源
      最近更新 更多