【问题标题】:Multiple functions in aggregate聚合多个功能
【发布时间】:2015-12-10 05:36:03
【问题描述】:

有可能从以下数据框df1

 Branch Loan_Amount TAT
      A         100 2.0
      A         120 4.0
      A         300 9.0
      B         150 1.5
      B         200 2.0

我可以使用聚合函数将以下输出作为数据框 df2

 Branch Number_of_loans Loan_Amount Total_TAT
      A               3         520      15.0
      B               2         350       3.5

我知道我可以使用 nrow 来计算 number_of_loans 并合并,但我正在寻找更好的方法。

【问题讨论】:

  • 为什么aggregate对你来说不是一个好方法?

标签: r aggregate


【解决方案1】:

基础包:

df1 <- aggregate(.~ Branch, df, FUN = "sum")
df2 <- setNames(aggregate(Loan_Amount~Branch, df, length)[2], c("Number_of_loans"))
cbind(df1, df2)

输出

  Branch Loan_Amount  TAT Number_of_loans
1      A         520 15.0               3
2      B         350  3.5               2

sqldf:

library(sqldf)
sqldf("SELECT Branch, COUNT(Loan_Amount) Number_of_loans, SUM(Loan_Amount) Loan_Amount, SUM(TAT) TAT 
      FROM df 
      GROUP BY Branch")

输出

  Branch Number_of_loans Loan_Amount  TAT
1      A               3         520 15.0
2      B               2         350  3.5

数据

df <- structure(list(Branch = structure(c(1L, 1L, 1L, 2L, 2L), .Label = c("A", 
"B"), class = "factor"), Loan_Amount = c(100L, 120L, 300L, 150L, 
200L), TAT = c(2, 4, 9, 1.5, 2)), .Names = c("Branch", "Loan_Amount", 
"TAT"), class = "data.frame", row.names = c(NA, -5L))

【讨论】:

    【解决方案2】:

    使用 dplyr,您可以这样做:

    library(dplyr)
    group_by(d,Branch) %>% 
      summarize(Number_of_loans = n(),
                Loan_Amount = sum(Loan_Amount),
                TAT = sum(TAT))
    

    输出

    Source: local data frame [2 x 4]
    
      Branch Number_of_loans Loan_Amount   TAT
      (fctr)           (int)       (int) (dbl)
    1      A               3         520  15.0
    2      B               2         350   3.5
    

    数据

    d <- read.table(text="Branch Loan_Amount TAT
    A         100 2.0
    A         120 4.0
    A         300 9.0
    B         150 1.5
    B         200 2.0",head=TRUE)
    

    【讨论】:

      【解决方案3】:

      使用 data.table

      library(data.table)
      setDT(df)[,list(Number_of_loans=.N, 
                      Loan_Amount    =sum(Loan_Amount), 
                      Total_TAT      =sum(TAT)), by=Branch]
      #    Branch Number_of_loans Loan_Amount Total_TAT
      # 1:      A               3         520      15.0
      # 2:      B               2         350       3.5
      

      【讨论】:

        【解决方案4】:

        这很笨拙且效率低下,但它有效且有趣(它使用aggregate()):

        d <- read.table(text="Branch Loan_Amount TAT
        A         100 2.0
        A         120 4.0
        A         300 9.0
        B         150 1.5
        B         200 2.0",head=TRUE)
        
        library(stringr)
        df = aggregate(.~Branch, data=d, FUN=function(x) paste0(length(x), '|',sum(x)))
        df_ = cbind(str_split_fixed(df$Loan_Amount, '|', 4)[,c(2,4)], str_split_fixed(df$TAT, '|', 4)[,4])
        df_ = apply(df_, 2, as.numeric)
        colnames(df_) = c('Number_of_loans','Loan_Amount','Total_TAT')
        cbind(df[,'Branch',drop=F], df_)
        

        产生所需的data.frame:

          Branch Number_of_loans Loan_Amount Total_TAT
        1      A               3         520      15.0
        2      B               2         350       3.5
        

        【讨论】:

          猜你喜欢
          • 2016-06-26
          • 1970-01-01
          • 1970-01-01
          • 1970-01-01
          • 1970-01-01
          • 1970-01-01
          • 1970-01-01
          • 1970-01-01
          • 1970-01-01
          相关资源
          最近更新 更多